Localizing a Diffusion Source on Graphs: Analysis and Design of Node Selection Strategies

2016-05-01T00:00:00Z (GMT) by Sabina Zejnilovic
Many different phenomena, such as spreading of infectious diseases in networks and dissemination of
information through social networks are modeled as a diffusion over a network of nodes. Localizing the
source of diffusion becomes crucial for quarantine efforts, as well as for identifying trendsetters. In many
real world networks, due to their size, it is unfeasible to observe the infection times of all nodes. In this case,
identification of the source node is carried out based only on a subset of nodes, called observer nodes. The
choice of observer nodes heavily impacts the accuracy of source localization.
In this thesis we specifically focus on the analysis and design of observer selection strategies with the
goal of understanding which are the best nodes to observe that contribute the most to the disambiguation of
the source on graphs for different diffusion scenarios. There are four main contributions of our work:
• We model the dynamics of network diffusion in a purely deterministic scenario as a linear timevarying
system, and present a necessary and sufficient condition for exact source localization in the
context of the proposed model, as well as in the context of graph theory. Relating the observer selection
problem to a known problem in graph theory, we design observer selection strategies with performance
guarantees.
• We present necessary and sufficient conditions for exact source localization when the network topology
is not fully known as the edges between different communities are hidden, and the components are all
trees, cycles, grids and complete graphs. We also give sufficient conditions when the components are
of arbitrary structure.
• We formulate a metric, based on error exponents, that can be used to compare different observer subsets
from the perspective of source localization error. We evaluate the metric for three different diffusion
scenarios; in each, the infection times are modeled as random variables with different distributions:
Gaussian, Laplace and exponential.
• We develop different sequential selection strategies, optimal and sub-optimal, for dynamic observer
selection in both deterministic and stochastic setting, and demonstrate the applicability of one of the
proposed strategies on a real-world dataset of a cholera outbreak.