Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency
Data compression is a promising technique to address the increasing main memory capacity demand in future systems. Unfortunately, directly applying previously proposed compression algorithms to main memory requires the memory controller to perform non-trivial computations to locate a cache line within the compressed main memory. These additional computations lead to significant increase in access latency, which can degrade system performance. Solutions proposed by prior work to address this performance degradation problem are either costly or energy ineffi- cient.
In this paper, we propose a new main memory compression framework that neither incurs the latency penalty nor requires costly or power-inefficient hardware. The key idea behind our proposal is that if all the cache lines within a page are compressed to the same size, then the location of a cache line within a compressed page is simply the product of the index of the cache line within the page and the size of a compressed cache line. We call a page compressed in such a manner a Linearly Compressed Page (LCP). LCP greatly reduces the amount of computation required to locate a cache line within the compressed page, while keeping the hardware implementation of the proposed main memory compression framework simple.
We adapt two previously proposed compression algorithms, Frequent Pattern Compression and Base-DeltaImmediate compression, to fit the requirements of LCP. Evaluations using benchmarks from SPEC CPU 2006 and five server benchmarks show that our approach can significantly increase the effective memory capacity (69% on average). In addition to the capacity gains, we evaluate the benefit of transferring consecutive compressed cache lines between the memory controller and main memory. Our new mechanism considerably reduces the memory bandwidth requirements of most of the evaluated benchmarks (46%/48% for CPU/GPU on average), and improves overall performance (6.1%/13.9%/10.7% for single-/two-/four-core CPU workloads on average) compared to a baseline system that does not employ main memory compression.