Carnegie Mellon University
Browse
file.pdf (487.46 kB)

Treelets —An Adaptive Multi-Scale Basis for Sparse Unordered Data

Download (487.46 kB)
journal contribution
posted on 2012-04-01, 00:00 authored by Ann B. Lee, Boaz Nadler, Larry Wasserman

In many modern applications, including analysis of gene expression and text documents, the data are noisy, high-dimensional, and unordered -- with no particular meaning to the given order of the variables. Yet, successful learning is often possible due to sparsity; the fact that the data are typically redundant with underlying structures that can be represented by only a few features. In this paper we present treelets -- a novel construction of multi-scale orthonormal bases that extends wavelets to non-smooth signals. Treelets capture the internal structure of the data and can as a dimensionality reduction tool significantly improve inference and prediction. We examine a variety of situations where treelets outperform principal component analysis and some common variable selection methods. The proposed method is illustrated on a linear mixture model, and on two real data sets: internet advertisements and DNA microarray data.

History

Date

2012-04-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC