Cached Sufficient Statistics for Automated Mining and Discovery from Massive Data Sources

Moore, Andrew; Schneider, Jeff; Anderson, Brigham; Davies, Scott; Komarek, Paul; Lee, Mary Soon; Meila, Marina; Munos, Remi; Myers, Kary; Pelleg, Pan

doi:10.1184/R1/6552224.v1

file.pdf (190.17 kB)

Cached Sufficient Statistics for Automated Mining and Discovery from Massive Data Sources

journal contribution

posted on 1999-01-01, 00:00 authored by Andrew Moore, Jeff Schneider, Brigham Anderson, Scott Davies, Paul Komarek, Mary Soon Lee, Marina Meila, Remi Munos, Kary Myers, Pan Pelleg

There many massive databases in industry and science. There are also many ways that decision makers, scientists, and the public need to interact with these data sources. Wide ranging statistics and machine learning algorithms similarly need to query databases, sometimes millions of times for a single inference. With millions or billions of records (e.g. biotechnology databases, inventory management systems, astrophysics sky surveys, corporate sales information, science lab data repositories) this can be intractable using current algorithms. The Auton lab (at Carnegie Mellon University) and Schenley Park Research Inc. (a start- up company), both jointly run by Andrew Moore and Jeff Schneider, are concerned with the fundamental computer science of making very advanced data analysis techniques computationally feasible for massive datasets.

History

Date

1999-01-01

Usage metrics

Keywords

Robotics

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Cached Sufficient Statistics for Automated Mining and Discovery from Massive Data Sources

History

Date

Usage metrics

Categories

Keywords

Licence

Exports