Reducing DRAM Latency at Low Cost by Exploiting Heterogeneity

2016-05-01T00:00:00Z (GMT) by Donghyuk Lee
In modern systems, DRAM-based main memory is signi cantly slower than the processor.
Consequently, processors spend a long time waiting to access data from main memory, making
the long main memory access latency one of the most critical bottlenecks to achieving high
system performance. Unfortunately, the latency of DRAM has remained almost constant in
the past decade. This is mainly because DRAM has been optimized for cost-per-bit, rather
than access latency. As a result, DRAM latency is not reducing with technology scaling, and
continues to be an important performance bottleneck in modern and future systems.
This dissertation seeks to achieve low latency DRAM-based memory systems at low cost
in three major directions. The key idea of these three major directions is to enable and ex-
ploit latency heterogeneity in DRAM architecture. First, based on the observation that long
bitlines in DRAM are one of the dominant sources of DRAM latency, we propose a new
DRAM architecture, Tiered-Latency DRAM (TL-DRAM), which divides the long bitline into
two shorter segments using an isolation transistor, allowing one segment to be accessed with
reduced latency. Second, we propose a ne-grained DRAM latency reduction mechanism,
Adaptive-Latency DRAM, which optimizes DRAM latency for the common operating conditions for individual DRAM module. We observe that DRAM manufacturers incorporate a very large timing margin as a provision against the worst-case operating conditions, which
is accessing the slowest cell across all DRAM products with the worst latency at the highest
temperature, even though such a slowest cell and such an operating condition are rare. Our
mechanism dynamically optimizes DRAM latency to the current operating condition of the
accessed DRAM module, thereby reliably improving system performance. Third, we observe
that cells closer to the peripheral logic can be much faster than cells farther from the peripheral
logic (a phenomenon we call architectural variation). Based on this observation, we propose a
new technique, Architectural-Variation-Aware DRAM (AVA-DRAM), which reduces DRAM
latency at low cost, by pro ling and identifying only the inherently slower regions in DRAM
to dynamically determine the lowest latency DRAM can operate at without causing failures.
This dissertation provides a detailed analysis of DRAM latency by using both circuit-level
simulation with a detailed DRAM model and FPGA-based pro ling of real DRAM modules.
Our latency analysis shows that our low latency DRAM mechanisms enable significant latency
reductions, leading to large improvement in both system performance and energy efficiency
across a variety of workloads in our evaluated systems, while ensuring reliable DRAM operation.