Carnegie Mellon University

Bayesian Networks for Lossless Dataset Compression

Download (145.2 kB)
journal contribution
posted on 1999-01-01, 00:00 authored by Scott Davies, Andrew Moore
The recent explosion in research on probabilistic data mining algorithms such as Bayesian networks has been focused primarily on their use in diagnostics, predictionand efficient inference. In this paper, we examine the use of Bayesian networks for a different purpose: lossless compression of large datasets. We present algorithms for automatically learning Bayesian networks and new structures called "Huffman networks" that model statistical relationships in the datasets, and algorithms for using these models to then compress the datasets. These algorithms often achieve significantly better compression ratios than achieved with common dictionary-based algorithms such those used by programs like ZIP.


Publisher Statement

Copyright © 1999 by the Association for Computing Machinery, Inc. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Publications Dept., ACM, Inc., fax +1 (212) 869-0481, or © ACM, 1999. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining {1-58113-143-7 (1999). }



Usage metrics


    Ref. manager