# Graphical Models and Overlay Networks for Reasoning about Large Distributed Systems

This thesis examines reasoning under uncertainty in distributed systems. Unlike in centralized

systems, where the observations reside in a single location, the observations in distributed

systems are often scattered across the network. To reason accurately, a networked

device often needs to incorporate observations from other nodes and must do so with limited

computation and communication even for large problems. The reasoning is further complicated

by unstable network conditions, characteristic to many real-world networks: the nodes

may fail, communication links may become unreliable, and the entire network may get fragmented

into several components that cannot communicate with each other. These aspects

make distributed inference very challenging.

We consider one general problem of distributed filtering for estimating the state of a dynamical

system and three independent applications: simultaneous localization and tracking,

where a camera network localizes itself by observing a moving object, internal localization of

large-scale modular robots, where a robot determines the relative poses of its internal parts,

and collaborative filtering for providing recommendations in a peer-to-peer network. These

problems share a common theme: each of these problems can be described by a graphical

model that permits compact representation of and efficient reasoning about the problem. Using

graphical models, we design algorithms that address challenges, such as inconsistency of

node beliefs in fragmented networks and difficult local optima in modular robot localization.

Due to the complexity of the reasoning tasks, it is not sufficient to coordinate the nodes locally

within each node’s immediate physical neighborhood. Instead, our algorithms employ

overlay networks—distributed data structures built on top of the physical networks—to coordinate

among distant nodes. The resulting algorithms obey the communication constraints

imposed by the network, while solving the problems robustly.

We evaluate our algorithms on data from real sensor networks and on a realistic deployment

on the PlanetLab network. We demonstrate robustness to network fluctuations and, in

some cases, our distributed algorithms improve upon state-of-the-art centralized approaches.