journal contribution
posted on 1999-01-01, 00:00 authored by Scott Davies, Andrew Moore The recent explosion in research on probabilistic data mining algorithms such as Bayesian networks has been focused primarily on their use in diagnostics, predictionand efficient inference. In this paper, we examine the use of Bayesian networks for a different purpose: lossless
compression of large datasets. We present algorithms
for automatically learning Bayesian networks and new
structures called "Huffman networks" that model statistical
relationships in the datasets, and algorithms for using these
models to then compress the datasets. These algorithms
often achieve significantly better compression ratios than
achieved with common dictionary-based algorithms such
those used by programs like ZIP.
History
Publisher Statement
Copyright © 1999 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or permissions@acm.org.
© ACM, 1999. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining {1-58113-143-7 (1999). }
http://doi.acm.org/10.1145/312129.312289Date
1999-01-01