joliva_MachineLearning_2018.pdf (17.74 MB)

Distribution and Histogram (DisH) Learning

thesis
posted on 2018-07-01, 00:00 authored by
a lot of this progress has been limited to basic point-estimation tasks. That is,
a large bulk of attention has been geared at solving problems that take in a static nite
vector and map it to another static nite vector. However, we do not navigate through life
in a series of point-estimation problems, mapping x to y. Instead, we nd broad patterns
and gather a far-sighted understanding of data by considering collections of points like
sets, sequences, and distributions. Thus, contrary to what various billionaires, celebrity
theoretical physicists, and sci- classics would lead you to believe, true machine intelligence
is fairly out of reach currently. In order to bridge this gap, this thesis develops algorithms
that understand data at an aggregate, holistic level.
This thesis pushes machine learning past the realm of operating over static nite vectors,
to start reasoning ubiquitously with complex, dynamic collections like sets and sequences.
We develop algorithms that consider distributions as functional covariates/responses, and
methods that use distributions as internal representations. We consider distributions since
they are a straightforward characterization of many natural phenomena and provide a
richer description than simple point data by detailing information at an aggregate level.
Our approach may be seen as addressing two sides of the same coin: on one side, we use
traditional machine learning algorithms adjusted to directly operate on inputs and outputs
that are probability functions (and sample sets); on the other side, we develop better
We begin by developing algorithms for traditional machine learning tasks for the cases
when one's input (and/or possibly output) is not a nite point, but is instead a distribution,
or sample set drawn from a distribution. We develop a scalable nonparametric estimator
for regressing a real valued response given an input that is a distribution, a case which we
coin distribution to real regression (DRR). Furthermore, we extend this work to the case
when both the output response and the input covariate are distributions; a task we call
distribution to distribution regression (DDR).
After, we look to expand the versatility and ecacy of traditional machine learning
tasks through novel methods that operate with distributions of features. For example, we
show that one may improve the performance of kernel learning tasks by learning a kernel's
spectral distribution in a data-driven fashion using Bayesian nonparametric techniques.
Moreover, we study how to perform sequential modeling by looking at summary statistics
from past points. Lastly, we also develop methods for high-dimensional density estimation
that make use of
exible transformations of variables and autoregressive conditionals.

2018-07-01

• Dissertation

Department

• Machine Learning

Degree Name

• Doctor of Philosophy (PhD)