Optimizing Data Movement Through Software Control of General-Purpose Hardware Caches
Computer systems are increasingly burdened by the rising cost of data movement. Moving data across chip in a modern processor consumes orders-of-magnitude more energy than performing an arithmetic operation on the data. On-chip caches also constitute more than half of a chip’s area. The severity of these problems will continue to grow alongside rising core counts and data-processing requirements.
The underlying issue is that chip multiprocessors (CMPs) provide a compute-centric programming interface where software views the entire memory hierarchy as a black box. Software issues loads and stores, and it is entirely up to hardware to manage all data movement between the core and main memory. Although this interface simplifies software, hardware is forced to resort to overly general application-agnostic optimizations.
To overcome the limitations of compute-centric CMPs, prior work has proposed specialized hierarchies which add custom logic to the hierarchy to enable novel data-movement-reducing features. Specialized hierarchies reduce data movement by either moving data closer to compute (data placement) or moving compute closer to data (near-data computing). These data-centric systems often provide significant benefits by customizing data movement within the memory hierarchy to specific applications. Unfortunately, adding custom logic to CMPs for every possible application domain is not a scalable solution.
The goal of this thesis is making specialized hierarchies practical by letting software customize data movement, eliminating the need for application-specific custom hardware. Our proposed systems address both data placement and near-data computing (NDC). First, Jumanji shows how software-controlled data placement for distributed last-level caches enables optimizing for a variety of application objectives on a single processor. Specifically, Jumanji targets a datacenter environment where co-running applications either care about tail latency or throughput, and all applications care about security. Second, täko demonstrates how a major NDC paradigm, data-triggered computation, can be ¯ implemented by letting software observe and manipulate data as it traverses the cache hierarchy. In täko, applications ¯ register software callbacks that execute in response to in-cache data-movement (i.e., cache misses, evictions, and writebacks), a novel data-centric mechanism that supports many optimizations which each previously required custom hardware. Finally, Leviathan unifies multiple NDC paradigms under a single architecture and programming interface to provide a truly practical NDC system. Together, these contributions exhibit the feasibility of programmable data movement in general-purpose processors.
- Electrical and Computer Engineering
- Doctor of Philosophy (PhD)