Carnegie Mellon University
Browse

Machine learning for flash caching in bulk storage systems

Download (5.73 MB)
thesis
posted on 2024-11-13, 20:56 authored by Lin Kit Daniel WongLin Kit Daniel Wong

Flash caches are used to reduce peak backend load for throughput-constrained data center services, reducing the total number of backend servers required. Bulk storage systems are a large-scale example; backed by high capacity but low-throughput hard disks, they use flash caches to provide a cost-effective storage layer underlying everything from blobstores to data warehouses.

However, flash caches must address flash’s limited write endurance by limiting the number of flash writes to avoid premature wear-out. Thus, most flash caches rely on admission policies to filter cache insertions and maximize the workload-reduction value of each write.

This dissertation evaluates and demonstrates potential uses of ML in place of traditional heuristic cache management policies for flash caches in bulk storage systems. The most successful elements of my research are embodied in a flash cache system called Baleen, which uses coordinated ML admission and prefetching to reduce peak backend load. After learning painful lessons with early ML policy attempts, I exploit a new cache residency model (episodes) to guide model training. I focus on optimizing an end-to-end metric (Disk-head Time) that measures backend load more accurately than IO miss rate or byte miss rate. Evaluation using 7-day Meta traces from 7 storage clusters shows that Baleen reduces Peak Disk-head Time (and hence backend hard disks required) by 12% over state-of-the-art policies for a fixed flash write rate constraint.

I present a TCO (total cost of ownership) formula quantifying the costs of additional flash writes against reductions in Peak Disk-head Time in terms of flash drives and hard disks needed. Baleen-TCO chooses optimal flash write rates and reduces estimated TCO by 17%.

Workloads change over time, requiring that caches adapt to maintain performance. I present a strategy for peak load reduction that adapts selectivity to load levels. I also evaluated workload drift and its impact on ML policy performance on 30-day Meta traces.

Baleen is the result of substantial exploration and experimentation with MLfor caching. I present lessons learned from additional strategies considered and explain why they saw limited success on our workloads. These include enhancements for ML-based eviction, more complex ML models, and optimizing the use of DRAM in hybrid caches. I also present lessons from ML production deployments.

Code and traces are available via https://www.pdl.cmu.edu/CILES/. These include our 7-day traces which were the most extensive public collection of traces from a production bulk storage system at the time of writing.

Funding

CNS: Core: Medium: Understanding and addressing device-reliability heterogeneity in large-scale distributed storage

Directorate for Computer & Information Science & Engineering

Find out more...

History

Date

2024-12-01

Degree Type

  • Dissertation

Department

  • Computer Science

Degree Name

  • Doctor of Philosophy (PhD)

Advisor(s)

Gregory R. Ganger

Usage metrics

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC