Architectural Techniques to Enhance DRAM Scaling

2015-06-01T00:00:00Z (GMT) by Yoongu Kim

For decades, main memory has enjoyed the continuous scaling of its physical substrate: DRAM (Dynamic Random Access Memory). But now, DRAM scaling has reached a threshold where DRAM cells cannot be made smaller without jeopardizing their robustness. This thesis identifies two specific challenges to DRAM scaling, and presents architectural techniques to overcome them.
First, DRAM cells are becoming less reliable. As DRAM process technology scales down to smaller dimensions, it is more likely for DRAM cells to electrically interfere with each
other’s operation. We confirm this by exposing the vulnerability of the latest DRAM chips to a reliability problem called disturbance errors. By reading repeatedly from the same cell in DRAM, we show that it is possible to corrupt the data stored in nearby cells. We demonstrate this phenomenon on Intel and AMD systems using a malicious program that generates many DRAM accesses. We provide an extensive characterization of the errors, as well as their behavior, using a custom-built testing platform. After examining various potential ways of addressing the problem, we propose a low-overhead solution that effectively
prevents the errors through a collaborative effort between the DRAM chips and the DRAM controller. Second, DRAM cells are becoming slower due to worsening variation in DRAM process technology. To alleviate the latency bottleneck, we propose to unlock fine-grained parallelism within a DRAM chip so that many accesses can be served at the same time. We take a close look at how a DRAM chip is internally organized, and find that it is divided into small partitions of DRAM cells called subarrays. Although the subarrays are mostly independent, they occasionally rely upon some global circuit components that force the subarrays to be operated one at a time. To overcome this limitation, we devise a series of non-intrusive changes to DRAM architecture that increases the autonomy of the subarrays
and allows them to be accessed concurrently. We show that such parallelism across subarrays provides large performance gains at low cost. Lastly, we present a powerful DRAM simulator that facilitates the design space exploration of main memory. Unlike previous simulators, our simulator is easy to modify, allowing DRAM architectural changes to be modeled quickly and accurately. This is why
our simulator is able to provide out-of-the-box support for a wide array of contemporary DRAM standards. Our simulator is also the fastest, outperforming the next fastest simulator
by more than a factor of two.