posted on 1999-01-01, 00:00authored byScott Davies, Andrew Moore
The recent explosion in research on probabilistic data mining algorithms such as Bayesian networks has been focused primarily on their use in diagnostics, predictionand efficient inference. In this paper, we examine the use of Bayesian networks for a different purpose: lossless
compression of large datasets. We present algorithms
for automatically learning Bayesian networks and new
structures called "Huffman networks" that model statistical
relationships in the datasets, and algorithms for using these
models to then compress the datasets. These algorithms
often achieve significantly better compression ratios than
achieved with common dictionary-based algorithms such
those used by programs like ZIP.