posted on 2004-04-03, 00:00authored byJure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanbriesenJeanne Vanbriesen, Natalie Glance
Given a water distribution network, where should we place sensors to quickly detect contaminants?
Or, which blogs should we read to avoid missing important stories?
These seemingly different problems share common structure: Outbreak detection can be modeled
as selecting nodes (sensor locations, blogs) in a network, in order to detect the spreading
of a virus or information as quickly as possible.
We present a general methodology for near optimal sensor placement in these and related
problems. We demonstrate that many realistic outbreak detection objectives (e.g., detection
likelihood, population affected) exhibit the property of “submodularity”. We exploit submodularity
to develop an efficient algorithm that scales to large problems, achieving near
optimal placements, while being 700 times faster than a simple greedy algorithm. We also
derive online bounds on the quality of the placements obtained by any algorithm. Our algorithms
and bounds also handle cases where nodes (sensor locations, blogs) have different
costs.
We evaluate our approach on several large real-world problems, including a model of a water
distribution network from the EPA, and real blog data. The obtained sensor placements are
provably near optimal, providing a constant fraction of the optimal solution. We show that
the approach scales, achieving speedups and savings in storage of several orders of magnitude.
We also show how the approach leads to deeper insights in both applications, answering
multicriteria trade-off, cost-sensitivity and generalization questions.