Localizing a Diffusion Source on Graphs: Analysis and Design of Node Selection Strategies

2018-12-14T17:56:57Z (GMT) by Sabina Zejnilovic
Many different phenomena, such as spreading of infectious diseases in networks and dissemination of<br>information through social networks are modeled as a diffusion over a network of nodes. Localizing the<br>source of diffusion becomes crucial for quarantine efforts, as well as for identifying trendsetters. In many<br>real world networks, due to their size, it is unfeasible to observe the infection times of all nodes. In this case,<br>identification of the source node is carried out based only on a subset of nodes, called observer nodes. The<br>choice of observer nodes heavily impacts the accuracy of source localization.<br>In this thesis we specifically focus on the analysis and design of observer selection strategies with the<br>goal of understanding which are the best nodes to observe that contribute the most to the disambiguation of<br>the source on graphs for different diffusion scenarios. There are four main contributions of our work:<br>• We model the dynamics of network diffusion in a purely deterministic scenario as a linear timevarying<br>system, and present a necessary and sufficient condition for exact source localization in the<br>context of the proposed model, as well as in the context of graph theory. Relating the observer selection<br>problem to a known problem in graph theory, we design observer selection strategies with performance<br>guarantees.<br>• We present necessary and sufficient conditions for exact source localization when the network topology<br>is not fully known as the edges between different communities are hidden, and the components are all<br>trees, cycles, grids and complete graphs. We also give sufficient conditions when the components are<br>of arbitrary structure.<br>• We formulate a metric, based on error exponents, that can be used to compare different observer subsets<br>from the perspective of source localization error. We evaluate the metric for three different diffusion<br>scenarios; in each, the infection times are modeled as random variables with different distributions:<br>Gaussian, Laplace and exponential.<br>• We develop different sequential selection strategies, optimal and sub-optimal, for dynamic observer<br>selection in both deterministic and stochastic setting, and demonstrate the applicability of one of the<br>proposed strategies on a real-world dataset of a cholera outbreak.