robind_phd_sta_2021.pdf (1.92 MB)

Download file# Advances in Nonasymptotic and Nonparametric Inference

This thesis develops tools for hypothesis testing and predictive inference in nonasymptotic settings. The universal likelihood ratio test (LRT) constructs hypothesis tests that are valid in finite samples and without regularity conditions. We implement the universal LRT to test the population mean

of d-dimensional Gaussian data and to test whether a density satisfies the nonparametric shape constraint of log-concavity. Conformal predictive inference produces valid prediction sets in finite samples without model assumptions, in the case where the data are exchangeable. We extend

conformal prediction to the random effects setting. The LRT based on the asymptotic chi-squared distribution of the log likelihood is one of the fundamental tools of statistical inference. A recent universal LRT approach based on sample splitting provides valid hypothesis tests and confidence sets in any setting for which we can compute

the split likelihood ratio statistic (or, more generally, an upper bound on the null maximum likelihood). This test empowers statisticians to construct tests in settings for which no valid

hypothesis test previously existed. Chapter 1 explains the universal LRT. In Chapter 2, we consider the simple but fundamental case of testing the population mean of

d-dimensional Gaussian data. This work presents the first in-depth exploration of the size, power, and relationships between several universal LRT variants. We show that a repeated subsampling approach is the best choice in terms of size and power. We observe reasonable performance even in a high-dimensional setting. We illustrate the benefits of the universal LRT through testing a nonconvex

doughnut-shaped null hypothesis, where a universal inference procedure can have higher power than a standard approach. Chapter 3 investigates the use of universal LRTs to test whether a density is log-concave. The shape constraint of log-concavity imposes a nonparametric density estimation problem with favorable convergence properties. We propose and implement several universal LRT variants for this test. This provides the first test of log-concavity with finite sample validity. We evaluate the universal LRT to test log-concavity on two-component Gaussian mixture models and on the Beta family. We find that universal LRTs that convert the d-dimensional testing problem to a one-dimensional testing problem can have the best performance.

Chapter 4 reviews the method of conformal predictive inference. Conformal prediction methods construct valid prediction sets in finite samples even when the assumed model is incorrect, under the assumption that the data are exchangeable. In Chapter 5, we extend the conformal method so that it is valid with random effects, in which case the data are not exchangeable. We develop a CDF pooling approach, a single subsampling approach, and a repeated subsampling approach to construct conformal prediction sets in unsupervised and supervised settings. We compare these approaches in terms of coverage and average set size. We recommend the repeated subsampling approach that constructs a conformal set by sampling one observation from each distribution multiple times. Simulations show that this approach has the best balance between coverage and

average conformal set size.

## History

## Date

2021-07-30## Degree Type

- Dissertation

## Department

- Statistics and Data Science

## Degree Name

- Doctor of Philosophy (PhD)