ssingh_MachineLearning_2019.pdf (6.27 MB)
Download file

Estimating Probability Distributions and their Properties

Download (6.27 MB)
thesis
posted on 24.02.2020, 18:03 by Shashank SinghShashank Singh
This thesis studies several theoretical problems in nonparametric statistics and machine learning, mostly in the areas of nonparametric density functional estimation
(estimating an integral functional of the population distribution from which the data are drawn) and nonparametric density estimation (estimating the entire population distribution from which the data are drawn). A consistent theme is that, although nonparametric density estimation is traditionally thought to be intractable in highdimensions, several equally (or more) useful tasks are relatively more tractable, even with similar or weaker assumptions on the distribution. Our work on density functional estimation focuses on several types of integral
functionals, such as information theoretic quantities (entropies, mutual informations, and divergences), measures of smoothness, and measures of (dis)similarity
between distributions, which play important roles as subroutines elsewhere in statistics, machine learning, and signal processing. For each of these quantities, under a
variety of nonparametric models, we provide some combination of (a) new estimators, (b) upper bounds on convergence rates of these new estimators, (c) new upper
bounds on the convergence rates of established estimators, (d) concentration bounds or asymptotic distributions for estimators, or (e) lower bounds on the minimax risk
of estimation. We briefly discuss some applications of these density functional estimators to hypothesis testing problems such as two-sample (homogeneity) or (conditional)
independence testing. For density estimation, whereas the majority of prior work has focused on estimation
under L2 or other Lp losses, we consider minimax convergence rates under several new losses, including the whole spectrum of Wasserstein distances and a large class of metrics called integral probability metrics (IPMs) that includes, for example,Lp, total variation, Kolmogorov-Smirnov, earth-mover, Sobolev, Besov, and some RKHS distances. These losses open several new possibilities for nonparametric density estimation in certain cases; some examples include
-convergence rates with no or reduced dependence on dimension
-density-free distribution estimation, for data lying in general (e.g., non-Euclidean) metric spaces, or for data whose distribution may not be absolutely continuous with respect to Lebesgue measure
-convergence rates depending only on intrinsic dimension of data
Our main results here are the derivation of minimax convergence rates. However, we also briefly discuss several consequences of our results. For example, we show
that IPMs have close connections with generative adversarial networks (GANs), and we leverage our results to prove the first finite-sample guarantees for GANs, in an
idealized model of GANs as density estimators. These results may help explain why these tools appear to perform well at problems that are intractable from traditional
perspectives of nonparametric statistics. We also briefly discuss consequences for estimation of certain density functionals, Monte Carlo integration of smooth functions,
and distributionally robust optimization.

History

Date

19/08/2019

Degree Type

Dissertation

Department

Machine Learning

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Barnabas Poczos