posted on 1994-05-01, 00:00authored byHan Liu, John D. Lafferty, Larry Wasserman
We consider the problem of estimating the
joint density of a d-dimensional random vector X = (X1,X2, ...,Xd) when d is large.
We assume that the density is a product of
a parametric component and a nonparametric component which depends on an unknown
subset of the variables. Using a modification
of a recently developed nonparametric regression framework called rodeo (regularization of
derivative expectation operator), we propose
a method to greedily select bandwidths in a
kernel density estimate. It is shown empirically that the density rodeo works well even
for very high dimensional problems. When
the unknown density function satisfies a suit-
ably defined sparsity condition, and the para-
metric baseline density is smooth, the approach is shown to achieve near optimal minimax rates of convergence, and thus avoids
the curse of dimensionality.