## Query-Specific Learning and Inference for Probabilistic Graphical Models

#### thesis

In order to distinguish essays and pre-prints from academic theses, we have a separate category. These are often much longer text based documents than a paper.

In numerous real world applications, from sensor networks to computer vision to natural text processing, one needs to reason about the system in question in the face of uncertainty. A key problem in all those settings is to compute the probability distribution over the variables of interest (the query) given the observed values of other random variables (the evidence). Probabilistic graphical models (PGMs) have become the approach of choice for representing and reasoning with high-dimensional probability distributions. However, for most models capable of accurately representing real-life distributions, inference is fundamentally intractable. As a result, optimally balancing the expressive power and inference complexity of the models, as well as designing better approximate inference algorithms, remain important open problems with potential to significantly improve the quality of answers to probabilistic queries.

This thesis contributes algorithms for learning and approximate inference in probabilistic graphical models that improve on the state of the art by emphasizing the computational aspects of inference over the representational properties of the models. Our contributions fall into two categories: learning accurate models where exact inference is tractable and speeding up approximate inference by focusing computation on the query variables and only spending as much effort on the remaining parts of the model as needed to answer the query accurately.

First, for a case when the set of evidence variables is not known in advance and a single model is needed that can be used to answer any query well, we propose a polynomial time algorithm for learning the structure of tractable graphical models with quality guarantees, including PAC learnability and graceful degradation guarantees. Ours is the first efficient algorithm to provide this type of guarantees. A key theoretical insight of our approach is a tractable upper bound on the mutual information of arbitrarily large sets of random variables that yields exponential speedups over the exact computation.

Second, for a setting where the set of evidence variables is known in advance, we propose an approach for learning tractable models that tailors the structure of the model for the particular value of evidence that become known at test time. By avoiding a commitment to a single tractable structure during learning, we are able to expand the representation power of the model without sacrificing efficient exact inference and parameter learning. We provide a general framework that allows one to leverage existing structure learning algorithms for discovering high-quality evidence-specific structures. Empirically, we demonstrate state of the art accuracy on real-life datasets and an order of magnitude speedup.

Finally, for applications where the intractable model structure is a given and approximate inference is needed, we propose a principled way to speed up convergence of belief propagation by focusing the computation on the query variables and away from the variables that are of no direct interest to the user. We demonstrate significant speedups over the state of the art on large-scale relational models. Unlike existing approaches, ours does not involve model simplification, and thus has an advantage of converging to the fixed point of the full model.

More generally, we argue that the common approach of concentrating on the structure of representation provided by PGMs, and only structuring the computation as representation allows, is suboptimal because of the fundamental computational problems. It is the computation that eventually yields answers to the queries, so directly focusing on structure of computation is a natural direction for improving the quality of the answers. The results of this thesis are a step towards adapting the structure of computation as a foundation of graphical models.