Statistical Theory and Methods for Comparing Distributions

Kim, Ilmun

doi:10.1184/R1/12327773.v1

ilmunk_phd_stat_2020.pdf (6.56 MB)

Statistical Theory and Methods for Comparing Distributions

thesis

posted on 2020-05-19, 19:53 authored by Ilmun KimIlmun Kim

With the recent advancement of data collection techniques, there has been an explosive growth in the size

and complex of data sets in many application domains. The rise of such unprecedented data has posed new

challenges as well as new opportunities to researchers in statistics and data science. Traditional methods,

tailored to static and low-dimensional data, perform poorly or are no longer applicable for modern high dimensional

data with complex structures. Moreover, classical asymptotic theory easily breaks down under non-traditional settings where numerous parameters can interact in dynamic ways. Motivated by these new challenges, this dissertation aims to develop novel methods and technical tools suitable for modern high dimensional data with particular emphasis on three types of testing problems: (i) one-sample testing, (ii)

two-sample testing and (iii) independence testing.

One of the major contributions of this thesis is to introduce a

exible two-sample testing framework that can leverage any existing classi?fication or regression method. By taking advantage of state-of-the-art algorithms in machine learning, the proposed method can efficiently handle different types of variables and various structures in high-dimensional data with competitive power under a variety of practical scenarios. To justify our approach, we provide rigorous theoretical and empirical analysis of their performance. With a speci?fic focus on Fisher's linear discriminant analysis, we prove more sophisticated results including minimax

optimality under common regularity conditions. In addition to supervised learning approaches, we also contribute to the literature by proposing goodness-of-?t tests for high-dimensional multinomials as well as multivariate generalizations of classical rank-based tests.

Another theme of this dissertation is concerned with permutation tests. Although the permutation

approach is standard in practical implementations of two-sample and independence testing, its theoretical

properties, especially power, have not been explored beyond simple cases. A major challenge of analyzing the

permutation test is that it depends on a random critical value which is a function of observations. We study

how to overcome this challenge and demonstrate that the permutation test has competitive power properties

for many interesting problems under non-traditional settings. In particular we use the minimax perspective

to evaluate the performance of a test and show that the permutation test is optimal for the problems where

minimax lower bounds are available.

History

Date

2020-05-11

Degree Type

Dissertation

Department

Statistics

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Larry Wasserman Sivaraman Balakrishnan

Usage metrics

Keywords

Hypothesis testing Asymptotic theory Minimax optimality

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Statistical Theory and Methods for Comparing Distributions

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports