a lot of this progress has been limited to basic point-estimation tasks. That is,

a large bulk of attention has been geared at solving problems that take in a static nite

vector and map it to another static nite vector. However, we do not navigate through life

in a series of point-estimation problems, mapping x to y. Instead, we nd broad patterns

and gather a far-sighted understanding of data by considering collections of points like

sets, sequences, and distributions. Thus, contrary to what various billionaires, celebrity

theoretical physicists, and sci- classics would lead you to believe, true machine intelligence

is fairly out of reach currently. In order to bridge this gap, this thesis develops algorithms

that understand data at an aggregate, holistic level.

This thesis pushes machine learning past the realm of operating over static nite vectors,

to start reasoning ubiquitously with complex, dynamic collections like sets and sequences.

We develop algorithms that consider distributions as functional covariates/responses, and

methods that use distributions as internal representations. We consider distributions since

they are a straightforward characterization of many natural phenomena and provide a

richer description than simple point data by detailing information at an aggregate level.

Our approach may be seen as addressing two sides of the same coin: on one side, we use

traditional machine learning algorithms adjusted to directly operate on inputs and outputs

that are probability functions (and sample sets); on the other side, we develop better

estimators for traditional tasks by making use of and adjusting internal distributions.

We begin by developing algorithms for traditional machine learning tasks for the cases

when one's input (and/or possibly output) is not a nite point, but is instead a distribution,

or sample set drawn from a distribution. We develop a scalable nonparametric estimator

for regressing a real valued response given an input that is a distribution, a case which we

coin distribution to real regression (DRR). Furthermore, we extend this work to the case

when both the output response and the input covariate are distributions; a task we call

distribution to distribution regression (DDR).

After, we look to expand the versatility and ecacy of traditional machine learning

tasks through novel methods that operate with distributions of features. For example, we

show that one may improve the performance of kernel learning tasks by learning a kernel's

spectral distribution in a data-driven fashion using Bayesian nonparametric techniques.

Moreover, we study how to perform sequential modeling by looking at summary statistics

from past points. Lastly, we also develop methods for high-dimensional density estimation

that make use of

exible transformations of variables and autoregressive conditionals.