posted on 2016-05-01, 00:00authored byDonghyuk Lee
In modern systems, DRAM-based main memory is signicantly slower than the processor. Consequently, processors spend a long time waiting to access data from main memory, making the long main memory access latency one of the most critical bottlenecks to achieving high system performance. Unfortunately, the latency of DRAM has remained almost constant in the past decade. This is mainly because DRAM has been optimized for cost-per-bit, rather than access latency. As a result, DRAM latency is not reducing with technology scaling, and continues to be an important performance bottleneck in modern and future systems. This dissertation seeks to achieve low latency DRAM-based memory systems at low cost in three major directions. The key idea of these three major directions is to enable and ex- ploit latency heterogeneity in DRAM architecture. First, based on the observation that long bitlines in DRAM are one of the dominant sources of DRAM latency, we propose a new DRAM architecture, Tiered-Latency DRAM (TL-DRAM), which divides the long bitline into two shorter segments using an isolation transistor, allowing one segment to be accessed with reduced latency. Second, we propose a ne-grained DRAM latency reduction mechanism, Adaptive-Latency DRAM, which optimizes DRAM latency for the common operating conditions for individual DRAM module. We observe that DRAM manufacturers incorporate a very large timing margin as a provision against the worst-case operating conditions, which is accessing the slowest cell across all DRAM products with the worst latency at the highest temperature, even though such a slowest cell and such an operating condition are rare. Our mechanism dynamically optimizes DRAM latency to the current operating condition of the accessed DRAM module, thereby reliably improving system performance. Third, we observe that cells closer to the peripheral logic can be much faster than cells farther from the peripheral logic (a phenomenon we call architectural variation). Based on this observation, we propose a new technique, Architectural-Variation-Aware DRAM (AVA-DRAM), which reduces DRAM latency at low cost, by proling and identifying only the inherently slower regions in DRAM to dynamically determine the lowest latency DRAM can operate at without causing failures. This dissertation provides a detailed analysis of DRAM latency by using both circuit-level simulation with a detailed DRAM model and FPGA-based proling of real DRAM modules. Our latency analysis shows that our low latency DRAM mechanisms enable significant latency reductions, leading to large improvement in both system performance and energy efficiency across a variety of workloads in our evaluated systems, while ensuring reliable DRAM operation.