Nonparametric Divergence Estimation and its Applications to Machine Learning
Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. Here we consider the setting where each instance of the inputs corresponds to a continuous probability distribution. These distributions are unknown to us, but we are given some i.i.d. samples from each of them. While most of the existing machine learning methods operate on points, i.e. finite-dimensional feature vectors, in our setting we study algorithms that operate on groups, i.e. sets of feature vectors. For this purpose, we propose new nonparametric, consistent estimators for a large family of divergences and describe how to apply them for machine learning problems. As important special cases, the estimators can be used to estimate R´enyi, Tsallis, Kullback-Leibler, Hellinger, Bhattacharyya distance, L2 divergences, and mutual information. We present empirical results on synthetic data, real word images, and astronomical data sets.