Carnegie Mellon University
Browse

ANF: A Fast and Scalable Tool for Data Mining in Massive Graphs

Download (209 kB)
journal contribution
posted on 1980-01-01, 00:00 authored by Christopher R Palmer, Phillip B Gibbons, Christos Faloutsos
Graphs are an increasingly important data source, with such important graphs as the Internet and the Web. Other familiar graphs include CAD circuits, phone records, gene sequences, city streets, social networks and academic citations. Any kind of relationship, such as actors appearing in movies, can be represented as a graph. This work presents a data mining tool, called ANF, that can quickly answer a number of interesting questions on graph-represented data, such as the following. How robust is the Internet to failures? What are the most influential database papers? Are there gender differences in movie appearance patterns? At its core, ANF is based on a fast and memory-efficient approach for approximating the complete “neighbourhood function” for a graph. For the Internet graph (268K nodes), ANF’s highly- accurate approximation is more than 700 times faster than the exact computation. This reduces the running time from nearly a day to a matter of a minute or two, allowing users to perform ad hoc drill-down tasks and to repeatedly answer questions about changing data sources. To enable this drill-down, ANF employs new techniques for approximating neighbourhood-type functions for graphs with distinguished nodes and/or edges. When compared to the best existing approximation, ANF’s approach is both faster and more accurate, given the same resources. Additionally, unlike previous approaches, ANF scales gracefully to handle disk resident graphs. Finally, we present some of our results from mining large graphs using ANF.

History

Publisher Statement

All Rights Reserved

Date

1980-01-01

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC