Localizing a Diffusion Source on Graphs: Analysis and Design of Node Selection Strategies

10.1184/R1/7461461.v1 Sabina Zejnilovic Sabina Zejnilovic Localizing a Diffusion Source on Graphs: Analysis and Design of Node Selection Strategies Carnegie Mellon University 2016 node selection strategies Electrical and Electronic Engineering not elsewhere classified 2016-05-01 00:00:00 Journal contribution https://kilthub.cmu.edu/articles/journal_contribution/Localizing_a_Diffusion_Source_on_Graphs_Analysis_and_Design_of_Node_Selection_Strategies/7461461 Many different phenomena, such as spreading of infectious diseases in networks and dissemination of information through social networks are modeled as a diffusion over a network of nodes. Localizing the source of diffusion becomes crucial for quarantine efforts, as well as for identifying trendsetters. In many real world networks, due to their size, it is unfeasible to observe the infection times of all nodes. In this case, identification of the source node is carried out based only on a subset of nodes, called observer nodes. The choice of observer nodes heavily impacts the accuracy of source localization. In this thesis we specifically focus on the analysis and design of observer selection strategies with the goal of understanding which are the best nodes to observe that contribute the most to the disambiguation of the source on graphs for different diffusion scenarios. There are four main contributions of our work: • We model the dynamics of network diffusion in a purely deterministic scenario as a linear timevarying system, and present a necessary and sufficient condition for exact source localization in the context of the proposed model, as well as in the context of graph theory. Relating the observer selection problem to a known problem in graph theory, we design observer selection strategies with performance guarantees. • We present necessary and sufficient conditions for exact source localization when the network topology is not fully known as the edges between different communities are hidden, and the components are all trees, cycles, grids and complete graphs. We also give sufficient conditions when the components are of arbitrary structure. • We formulate a metric, based on error exponents, that can be used to compare different observer subsets from the perspective of source localization error. We evaluate the metric for three different diffusion scenarios; in each, the infection times are modeled as random variables with different distributions: Gaussian, Laplace and exponential. • We develop different sequential selection strategies, optimal and sub-optimal, for dynamic observer selection in both deterministic and stochastic setting, and demonstrate the applicability of one of the proposed strategies on a real-world dataset of a cholera outbreak.