For discovering hidden (latent) variables in real-world, nongaussian
data streams or an n-dimensional cloud of data points, SVD
suffers from its orthogonality constraint. Our proposed method, “AutoSplit”,
finds features which are mutually independent and is able to
discover non-orthogonal features. Thus, (a) finds more meaningful hidden
variables and features, (b) it can easily lead to clustering and segmentation,
(c) it surprisingly scales linearly with the database size and
(d) it can also operate in on-line, single-pass mode. We also propose
“Clustering-AutoSplit”, which extends the feature discovery to multiple
feature/bases sets, and leads to clean clustering. Experiments on multiple,
real-world data sets show that our method meets all the properties
above, outperforming the state-of-the-art SVD.