<dc:dc xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">
<dc:title>Distribution and Histogram (DisH) Learning</dc:title>
<dc:creator>Junier Oliva</dc:creator>
<dc:identifier identifierType="DOI">10.1184/R1/7553882.v1</dc:identifier>
<dc:relation>https://kilthub.cmu.edu/articles/thesis/Distribution_and_Histogram_DisH_Learning/7553882</dc:relation>
<dc:description>Machine learning has made incredible advances in the last couple of decades. Notwithstanding,&lt;br&gt;a lot of this progress has been limited to basic point-estimation tasks. That is,&lt;br&gt;a large bulk of attention has been geared at solving problems that take in a static nite&lt;br&gt;vector and map it to another static nite vector. However, we do not navigate through life&lt;br&gt;in a series of point-estimation problems, mapping x to y. Instead, we nd broad patterns&lt;br&gt;and gather a far-sighted understanding of data by considering collections of points like&lt;br&gt;sets, sequences, and distributions. Thus, contrary to what various billionaires, celebrity&lt;br&gt;theoretical physicists, and sci- classics would lead you to believe, true machine intelligence&lt;br&gt;is fairly out of reach currently. In order to bridge this gap, this thesis develops algorithms&lt;br&gt;that understand data at an aggregate, holistic level.&lt;br&gt;This thesis pushes machine learning past the realm of operating over static nite vectors,&lt;br&gt;to start reasoning ubiquitously with complex, dynamic collections like sets and sequences.&lt;br&gt;We develop algorithms that consider distributions as functional covariates/responses, and&lt;br&gt;methods that use distributions as internal representations. We consider distributions since&lt;br&gt;they are a straightforward characterization of many natural phenomena and provide a&lt;br&gt;richer description than simple point data by detailing information at an aggregate level.&lt;br&gt;Our approach may be seen as addressing two sides of the same coin: on one side, we use&lt;br&gt;traditional machine learning algorithms adjusted to directly operate on inputs and outputs&lt;br&gt;that are probability functions (and sample sets); on the other side, we develop better&lt;br&gt;estimators for traditional tasks by making use of and adjusting internal distributions.&lt;br&gt;We begin by developing algorithms for traditional machine learning tasks for the cases&lt;br&gt;when one&#x27;s input (and/or possibly output) is not a nite point, but is instead a distribution,&lt;br&gt;or sample set drawn from a distribution. We develop a scalable nonparametric estimator&lt;br&gt;for regressing a real valued response given an input that is a distribution, a case which we&lt;br&gt;coin distribution to real regression (DRR). Furthermore, we extend this work to the case&lt;br&gt;when both the output response and the input covariate are distributions; a task we call&lt;br&gt;distribution to distribution regression (DDR).&lt;br&gt;After, we look to expand the versatility and ecacy of traditional machine learning&lt;br&gt;tasks through novel methods that operate with distributions of features. For example, we&lt;br&gt;show that one may improve the performance of kernel learning tasks by learning a kernel&#x27;s&lt;br&gt;spectral distribution in a data-driven fashion using Bayesian nonparametric techniques.&lt;br&gt;Moreover, we study how to perform sequential modeling by looking at summary statistics&lt;br&gt;from past points. Lastly, we also develop methods for high-dimensional density estimation&lt;br&gt;that make use of &lt;br&gt;exible transformations of variables and autoregressive conditionals.</dc:description>
<dc:date>2018-07-01 00:00:00</dc:date>
<dc:subject>Distributions</dc:subject>
<dc:subject>Sequences</dc:subject>
<dc:subject>Sets</dc:subject>
<dc:subject>Nonparametric</dc:subject>
<dc:subject>Statistics</dc:subject>
<dc:subject>machine Learning</dc:subject>
<dc:subject>Knowledge Representation and Machine Learning</dc:subject>
</dc:dc>