posted on 2005-01-01, 00:00authored byShoba Venkataraman, Dawn Song, Phillip B Gibbons, Avrim Blum
High-speed monitoring of Internet traffic is an important
and challenging problem, with applications to realtime
attack detection and mitigation, traffic engineering,
etc. However, packet-level monitoring requires fast
streaming algorithms that use very little memory and little
communication among collaborating network monitoring
points.
In this paper, we consider the problem of detecting
superspreaders, which are sources that connect to
a large number of distinct destinations. We propose
new streaming algorithms for detecting superspreaders
and prove guarantees on their accuracy and memory
requirements. We also show experimental results
on real network traces. Our algorithms are substantially
more efficient (both theoretically and experimentally)
than previous approaches. We also extend our algorithms
to identify superspreaders in a distributed setting,
with sliding windows, and when deletions are allowed
in the stream (which lets us identify sources that
make a large number of failed connections to distinct
destinations).
More generally, our algorithms are applicable to any
problem that can be formulated as follows: given a
stream of (x; y) pairs, find all the x’s that are paired
with a large number of distinct y’s. We call this the
heavy distinct-hitters problem. There are many network
security applications of this general problem. This paper
discusses these applications and, for concreteness,
focuses on the superspreader problem.