Carnegie Mellon University
Browse

A statistical approach to algorithmic analysis of high-dimensional nearest-neighbor search

Download (1.99 MB)
journal contribution
posted on 2003-07-01, 00:00 authored by Alexander Gray
Abstract: "The most commonly used algorithms for spatial data searches such as k-nearest-neighbor and spherical range queries are based on a class of data structures we call space-partitioning trees, which have remained the pragmatic method of choice due to their ability to often empirically provide sub-linear efficiency in reported dimensionalities in the tens and occasionally beyond, in contrast to methods designed for worst-case optimality. Despite long-standing practical interest in a more realistic runtime analysis of such methods, particularly in the high-dimensional case demanded by many modern applications, little further progress has been made since the seminal expected-time analysis of 1977. One fundamental reason for this is that algorithm analysis has not, to date, provided examples of analyses which link algorithmic runtime to probabilistic properties of the input distribution. This paper introduces some basic statistical machinery for making this link, and thereby presents initial steps toward providing a statistically principled framework for distribution-dependent runtime analysis of space-partitioning-based algorithms, with an emphasis on providing explanations for their observed behavior in high-dimensional spaces."

History

Date

2003-07-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC