Kateja_cmu_0041E_10543.pdf (1.65 MB)

Reducing Performance Overhead of Direct Access NVM Storage Redundancy

Download (1.65 MB)
thesis
posted on 02.10.2020 by Rajat Kateja
Non-volatile memory (NVM) based storage is poised for mainstream deployment. DIMM form-factor NVM devices reside on the memory bus and offer DRAM-like access granularities and latencies along with non-volatility. NVM's
Direct Access (DAX) interface enables applications to map persistent data into their address space and access it with load and store instructions, eliminating system software overheads. Production deployment of DAX NVM storage would require that the storage system offer resilience against firmware-bug-induced data corruption, akin to
conventional storage systems. Protection against firmware-bug-induced data corruptions requires the storage system to maintain system-level redundancy, which we refer to as system-redundancy. With DAX interfacing, the lack of
interposed system software makes it challenging to identify data reads and writes that should trigger system-redundancy verification and updates, respectively. Further, the DAX granularities (e.g., 64-byte cache-lines) are incongruent
with typical system-redundancy granularities (e.g., 4K pages), leading to high performance overhead in maintaining system-redundancy. This dissertation demonstrates that DAX NVM storage systems can efficiently maintain system-redundancy by relaxing the data coverage guarantees or by leveraging a hardware offload. We support the thesis with two case studies: Vilamb and Tvarak. The Vilamb library maintains system-redundancy synchronously,
avoiding critical path interpositioning and amortizes the overhead of system-redundancy updates across multiple writes to a page. As a result, Vilamb provides 3-5x the throughput of the state-of-the-art software solution
at high operation rates. For applications that need system-redundancy with high performance, and can tolerate some delaying of data redundancy, Vilamb provides a tunable knob between performance and time-to-coverage. Even
with the delayed coverage, Vilamb increases the mean time to data loss due to firmware-induced corruptions by up to two orders of magnitude in comparison to maintaining no system-redundancy. Tvarak is a software-managed hardware offload to efficiently maintain system redundancy
for direct-access (DAX) NVM storage. Tvarak reconciles the mismatch between DAX granularities and typical system-redundancy granularities by introducing cache-line granular checksums (only) for DAX-mapped data. Tvarak also uses caching to reduce the number of extra NVM accesses for
maintaining and verifying system-redundancy. Applications' data access locality leads to reuse of system-redundancy that Tvarak leverages with a small dedicated on-controller cache and configurable LLC partitions. Simulation-based evaluation demonstrates Tvarak's efficiency. For example, Tvarak reduces Redis set-only performance by only 3%.

History

Date

01/05/2020

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Gregory R. Ganger

Exports

Exports