file.pdf (487.46 kB)

Treelets —An Adaptive Multi-Scale Basis for Sparse Unordered Data

Download (487.46 kB)
journal contribution
posted on 01.04.2012 by Ann B. Lee, Boaz Nadler, Larry Wasserman

In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered -- with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity; the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper we present treelets -- a novel construction of multi-scale orthonormal bases that extends wavelets to non-smooth signals. Treelets capture the internal structure of the data and can as a dimensionality reduction tool significantly improve inference and prediction. We examine a variety of situations where treelets outperform principal component analysis and some common variable selection methods. The proposed method is illustrated on a linear mixture model, and on two real data sets: internet advertisements and DNA microarray data.