posted on 1980-01-01, 00:00authored byChristopher R Palmer, Phillip B Gibbons, Christos Faloutsos
Graphs are an increasingly important data source, with such
important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations.
Any kind of relationship, such as actors appearing in movies,
can be represented as a graph. This work presents a data
mining tool, called ANF, that can quickly answer a number
of interesting questions on graph-represented data, such as
the following. How robust is the Internet to failures? What
are the most influential database papers? Are there gender differences in movie appearance patterns? At its core,
ANF is based on a fast and memory-efficient approach for
approximating the complete “neighbourhood function” for a
graph. For the Internet graph (268K nodes), ANF’s highly-
accurate approximation is more than 700 times faster than
the exact computation. This reduces the running time from
nearly a day to a matter of a minute or two, allowing
users to perform ad hoc drill-down tasks and to repeatedly
answer questions about changing data sources. To enable
this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best
existing approximation, ANF’s approach is both faster and
more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle
disk resident graphs. Finally, we present some of our results
from mining large graphs using ANF.