Distribution and Histogram (DisH) Learning

2019-01-16T19:30:34Z (GMT) by Junier Oliva
Machine learning has made incredible advances in the last couple of decades. Notwithstanding,<br>a lot of this progress has been limited to basic point-estimation tasks. That is,<br>a large bulk of attention has been geared at solving problems that take in a static nite<br>vector and map it to another static nite vector. However, we do not navigate through life<br>in a series of point-estimation problems, mapping x to y. Instead, we nd broad patterns<br>and gather a far-sighted understanding of data by considering collections of points like<br>sets, sequences, and distributions. Thus, contrary to what various billionaires, celebrity<br>theoretical physicists, and sci- classics would lead you to believe, true machine intelligence<br>is fairly out of reach currently. In order to bridge this gap, this thesis develops algorithms<br>that understand data at an aggregate, holistic level.<br>This thesis pushes machine learning past the realm of operating over static nite vectors,<br>to start reasoning ubiquitously with complex, dynamic collections like sets and sequences.<br>We develop algorithms that consider distributions as functional covariates/responses, and<br>methods that use distributions as internal representations. We consider distributions since<br>they are a straightforward characterization of many natural phenomena and provide a<br>richer description than simple point data by detailing information at an aggregate level.<br>Our approach may be seen as addressing two sides of the same coin: on one side, we use<br>traditional machine learning algorithms adjusted to directly operate on inputs and outputs<br>that are probability functions (and sample sets); on the other side, we develop better<br>estimators for traditional tasks by making use of and adjusting internal distributions.<br>We begin by developing algorithms for traditional machine learning tasks for the cases<br>when one's input (and/or possibly output) is not a nite point, but is instead a distribution,<br>or sample set drawn from a distribution. We develop a scalable nonparametric estimator<br>for regressing a real valued response given an input that is a distribution, a case which we<br>coin distribution to real regression (DRR). Furthermore, we extend this work to the case<br>when both the output response and the input covariate are distributions; a task we call<br>distribution to distribution regression (DDR).<br>After, we look to expand the versatility and ecacy of traditional machine learning<br>tasks through novel methods that operate with distributions of features. For example, we<br>show that one may improve the performance of kernel learning tasks by learning a kernel's<br>spectral distribution in a data-driven fashion using Bayesian nonparametric techniques.<br>Moreover, we study how to perform sequential modeling by looking at summary statistics<br>from past points. Lastly, we also develop methods for high-dimensional density estimation<br>that make use of <br>exible transformations of variables and autoregressive conditionals.