ssingh_MachineLearning_2019.pdf (6.27 MB)

Download file# Estimating Probability Distributions and their Properties

This thesis studies several theoretical problems in nonparametric statistics and machine learning, mostly in the areas of nonparametric density functional estimation

(estimating an integral functional of the population distribution from which the data are drawn) and nonparametric density estimation (estimating the entire population distribution from which the data are drawn). A consistent theme is that, although nonparametric density estimation is traditionally thought to be intractable in highdimensions, several equally (or more) useful tasks are relatively more tractable, even with similar or weaker assumptions on the distribution. Our work on density functional estimation focuses on several types of integral

functionals, such as information theoretic quantities (entropies, mutual informations, and divergences), measures of smoothness, and measures of (dis)similarity

between distributions, which play important roles as subroutines elsewhere in statistics, machine learning, and signal processing. For each of these quantities, under a

variety of nonparametric models, we provide some combination of (a) new estimators, (b) upper bounds on convergence rates of these new estimators, (c) new upper

bounds on the convergence rates of established estimators, (d) concentration bounds or asymptotic distributions for estimators, or (e) lower bounds on the minimax risk

of estimation. We briefly discuss some applications of these density functional estimators to hypothesis testing problems such as two-sample (homogeneity) or (conditional)

independence testing. For density estimation, whereas the majority of prior work has focused on estimation

Our main results here are the derivation of minimax convergence rates. However, we also briefly discuss several consequences of our results. For example, we show

that IPMs have close connections with generative adversarial networks (GANs), and we leverage our results to prove the first finite-sample guarantees for GANs, in an

idealized model of GANs as density estimators. These results may help explain why these tools appear to perform well at problems that are intractable from traditional

perspectives of nonparametric statistics. We also briefly discuss consequences for estimation of certain density functionals, Monte Carlo integration of smooth functions,

and distributionally robust optimization.

(estimating an integral functional of the population distribution from which the data are drawn) and nonparametric density estimation (estimating the entire population distribution from which the data are drawn). A consistent theme is that, although nonparametric density estimation is traditionally thought to be intractable in highdimensions, several equally (or more) useful tasks are relatively more tractable, even with similar or weaker assumptions on the distribution. Our work on density functional estimation focuses on several types of integral

functionals, such as information theoretic quantities (entropies, mutual informations, and divergences), measures of smoothness, and measures of (dis)similarity

between distributions, which play important roles as subroutines elsewhere in statistics, machine learning, and signal processing. For each of these quantities, under a

variety of nonparametric models, we provide some combination of (a) new estimators, (b) upper bounds on convergence rates of these new estimators, (c) new upper

bounds on the convergence rates of established estimators, (d) concentration bounds or asymptotic distributions for estimators, or (e) lower bounds on the minimax risk

of estimation. We briefly discuss some applications of these density functional estimators to hypothesis testing problems such as two-sample (homogeneity) or (conditional)

independence testing. For density estimation, whereas the majority of prior work has focused on estimation

under L

^{2}or other L^{p}losses, we consider minimax convergence rates under several new losses, including the whole spectrum of Wasserstein distances and a large class of metrics called integral probability metrics (IPMs) that includes, for example,L^{p}, total variation, Kolmogorov-Smirnov, earth-mover, Sobolev, Besov, and some RKHS distances. These losses open several new possibilities for nonparametric density estimation in certain cases; some examples include-convergence rates with no or reduced dependence on dimension

-density-free distribution estimation, for data lying in general (e.g., non-Euclidean) metric spaces, or for data whose distribution may not be absolutely continuous with respect to Lebesgue measure

-convergence rates depending only on intrinsic dimension of dataOur main results here are the derivation of minimax convergence rates. However, we also briefly discuss several consequences of our results. For example, we show

that IPMs have close connections with generative adversarial networks (GANs), and we leverage our results to prove the first finite-sample guarantees for GANs, in an

idealized model of GANs as density estimators. These results may help explain why these tools appear to perform well at problems that are intractable from traditional

perspectives of nonparametric statistics. We also briefly discuss consequences for estimation of certain density functionals, Monte Carlo integration of smooth functions,

and distributionally robust optimization.

## History

## Date

19/08/2019## Degree Type

Dissertation## Department

Machine Learning## Degree Name

- Doctor of Philosophy (PhD)