A statistical approach to algorithmic analysis of high-dimensional nearest-neighbor search
journal contributionposted on 01.07.2003, 00:00 by Alexander Gray
Abstract: "The most commonly used algorithms for spatial data searches such as k-nearest-neighbor and spherical range queries are based on a class of data structures we call space-partitioning trees, which have remained the pragmatic method of choice due to their ability to often empirically provide sub-linear efficiency in reported dimensionalities in the tens and occasionally beyond, in contrast to methods designed for worst-case optimality. Despite long-standing practical interest in a more realistic runtime analysis of such methods, particularly in the high-dimensional case demanded by many modern applications, little further progress has been made since the seminal expected-time analysis of 1977. One fundamental reason for this is that algorithm analysis has not, to date, provided examples of analyses which link algorithmic runtime to probabilistic properties of the input distribution. This paper introduces some basic statistical machinery for making this link, and thereby presents initial steps toward providing a statistically principled framework for distribution-dependent runtime analysis of space-partitioning-based algorithms, with an emphasis on providing explanations for their observed behavior in high-dimensional spaces."