Dimensionality curse and dimensionality reduction are two issues that have retained high interest
for data mining, machine learning, multimedia indexing, and clustering. We present a fast, scalable
algorithm to quickly select the most important attributes (dimensions) for a given set of n-dimensional
vectors. In contrast to older methods, our method has the following desirable properties: (a) it does not
do rotation of attributes, thus leading to easy interpretation of the resulting attributes; (b) it can spot
attributes that have nonlinear correlations; (c) it requires a constant number of passes over the dataset;
(d) it gives a good estimate on how many attributes we should keep.
The idea is to use the ‘fractal’ dimension of a dataset as a good approximation of its intrinsic
dimension, and to drop attributes that do not affect it. We applied our method on real and synthetic
datasets, where it gave fast and good results.