posted on 2003-01-01, 00:00authored byChristos Faloutsos, Volker Gaede
There is mounting evidence [Man77, Sch91] that real datasets are statistically self-similar, and thus, `fractal'. This is an important insight since it permits a compact statistical description of spatial datasets; subsequently, as we show, it also forms the basis for the theoretical analysis of spatial access methods, without using the typical, but unrealistic, uniformity assumption.
In this paper, we focus on the estimation of the number of number of quadtree blocks that a real, spatial dataset will require. Using the the well-known Hausdorff fractal dimension, we derive some closed formulas which allow us to predict the number of quadtree blocks, given some few parameters. Using our formulas, it is possible to predict the space overhead and the response time of linear quadtrees/z-orderings [OM88], which are widely used in practice. In order to verify our analytical model, we performed an extensive experimental investigation using several real datasets coming from different domains. In these experiments, we found that our analytical model agrees well with our experiments as well as with older empirical observations on 2-d [Gae95b] and 3-d [ACF+94] data.