posted on 2001-11-01, 00:00authored byChristos Faloutsos, Yossi Matias, Avi Silberschatz
The focus of this paper is on the characterization of the skewness of an attribute-value distribution and on the extrapolations for interesting parameters. More specifically, given a vector with the highest h multiplicities mvec = (m1, m2, ..., mh), and some frequency moments Fq =\sum miq , (e.g., q=0, 2), we provide effective schemes for obtaining estimates about either its statistics or subsets/supersets of the relation.
We assume an 80/20 law, and specifically, a p/(1-p) law. This law gives a distribution which is commonly known in the fractals literature as `multifractal'. We show how to estimate p from the given information (first few multiplicities and a few moments), and present the results of our experimentations on real data. Our results demonstrate that schemes based on our multifractal assumption consistently outperforms those schemes based on the uniformity assumption, which are commonly used in current DBMSs. Moreover, our schemes can be used to provide estimates for supersets of a relation, which the uniformity assumption based schemes can not not provide at all.