On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

Ramdas, Aaditya; Reddi, Sashank J.; Poczos, Barnabas; Singh, Aarti; Wasserman, Larry

doi:10.1184/R1/6476207.v1

file.pdf (622.11 kB)

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

journal contribution

posted on 2014-11-25, 00:00 authored by Aaditya Ramdas, Sashank J. Reddi, Barnabas Poczos, Aarti Singh, Larry Wasserman

Nonparametric two sample testing deals with the question of consistently deciding if two distributions are different, given samples from both, without making any parametric assumptions about the form of the distributions. The current literature is split into two kinds of tests - those which are consistent without any assumptions about how the distributions may differ (general alternatives), and those which are designed to specifically test easier alternatives, like a difference in means (mean-shift alternatives).

The main contribution of this paper is to explicitly characterize the power of a popular nonparametric two sample test, designed for general alternatives, under a mean-shift alternative in the high-dimensional setting. Specifically, we explicitly derive the power of the linear-time Maximum Mean Discrepancy statistic using the Gaussian kernel, where the dimension and sample size can both tend to infinity at any rate, and the two distributions differ in their means. As a corollary, we find that if the signal-to-noise ratio is held constant, then the test’s power goes to one if the number of samples increases faster than the dimension increases. This is the first explicit power derivation for a general nonparametric test in the high-dimensional setting, and also the first analysis of how tests designed for general alternatives perform when faced with easier ones.

History

Date

2014-11-25

Usage metrics

Keywords

Machine Learning

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

On the High-dimensional Power of Linear-time Kernel Two-Sample Testing under Mean-difference Alternatives

History

Date

Usage metrics

Categories

Keywords

Licence

Exports