information through social networks are modeled as a diffusion over a network of nodes. Localizing the

source of diffusion becomes crucial for quarantine efforts, as well as for identifying trendsetters. In many

real world networks, due to their size, it is unfeasible to observe the infection times of all nodes. In this case,

identification of the source node is carried out based only on a subset of nodes, called observer nodes. The

choice of observer nodes heavily impacts the accuracy of source localization.

In this thesis we specifically focus on the analysis and design of observer selection strategies with the

goal of understanding which are the best nodes to observe that contribute the most to the disambiguation of

the source on graphs for different diffusion scenarios. There are four main contributions of our work:

• We model the dynamics of network diffusion in a purely deterministic scenario as a linear timevarying

system, and present a necessary and sufficient condition for exact source localization in the

context of the proposed model, as well as in the context of graph theory. Relating the observer selection

problem to a known problem in graph theory, we design observer selection strategies with performance

guarantees.

• We present necessary and sufficient conditions for exact source localization when the network topology

is not fully known as the edges between different communities are hidden, and the components are all

trees, cycles, grids and complete graphs. We also give sufficient conditions when the components are

of arbitrary structure.

• We formulate a metric, based on error exponents, that can be used to compare different observer subsets

from the perspective of source localization error. We evaluate the metric for three different diffusion

scenarios; in each, the infection times are modeled as random variables with different distributions:

Gaussian, Laplace and exponential.

• We develop different sequential selection strategies, optimal and sub-optimal, for dynamic observer

selection in both deterministic and stochastic setting, and demonstrate the applicability of one of the

proposed strategies on a real-world dataset of a cholera outbreak.