Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

We consider the problem of estimating the joint density of a d-dimensional random vector X = (X1,X2, ...,Xd) when d is large. We assume that the density is a product of a parametric component and a nonparametric component which depends on an unknown subset of the variables. Using a modification of a recently developed nonparametric regression framework called rodeo (regularization of derivative expectation operator), we propose a method to greedily select bandwidths in a kernel density estimate. It is shown empirically that the density rodeo works well even for very high dimensional problems. When the unknown density function satisfies a suit- ably defined sparsity condition, and the para- metric baseline density is smooth, the approach is shown to achieve near optimal minimax rates of convergence, and thus avoids the curse of dimensionality.