Given a large sparse graph, how can we find patterns and
anomalies? Several important applications can be modeled
as large sparse graphs, e.g., network traffic monitoring,
research citation network analysis, social network analysis,
and regulatory networks in genes. Low rank decompositions,
such as SVD and CUR, are powerful techniques for revealing
latent/hidden variables and associated patterns from high
dimensional data. However, those methods often ignore the
sparsity property of the graph, and hence usually incur too
high memory and computational cost to be practical.
We propose a novel method, the Compact Matrix Decomposition
(CMD), to compute sparse low rank approximations.
CMD dramatically reduces both the computation
cost and the space requirements over existing decomposition
methods (SVD, CUR). Using CMD as the key building
block, we further propose procedures to efficiently construct
and analyze dynamic graphs from real-time application
data. We provide theoretical guarantee for our methods,
and present results on two real, large datasets, one on network
flow data (100GB trace of 22K hosts over one month)
and one on DBLP (200MB over 25 years).
We show that CMD is often an order of magnitude more
efficient than the state of the art (SVD and CUR): it is over
10X faster, but requires less than 1/10 of the space, for the
same reconstruction accuracy. Finally, we demonstrate how
CMD is used for detecting anomalies and monitoring time-evolving
graphs, in which it successfully detects worm-like
hierarchical scanning patterns in real network data.