Input, Representation, and Access Pattern Guided Cache Locality Optimizations for Graph Analytics

Balaji, Vignesh

doi:10.1184/R1/19229661.v1

Balaji_cmu_0041E_10678.pdf (2.86 MB)

Input, Representation, and Access Pattern Guided Cache Locality Optimizations for Graph Analytics

thesis

posted on 2022-03-01, 21:50 authored by Vignesh BalajiVignesh Balaji

Graph analytics has many important commercial and scientific applications ranging from social network analysis to tracking disease transmission using contact networks. Early works in graph analytics primarily focused on processing graphs using large-scale distributed systems. More recently, increasing main memory capacities and core counts have prompted a shift towards analyzing large graphs using just a single machine. Multiple studies have demonstrated that in-memory graph analytics can even outperform distributed graph analytics. However, performance of in-memory graph analytics is still far from optimal because the characteristic irregular memory access pattern of graph applications leads to poor cache locality. Irregular memory accesses are fundamental to graph analytics and are a product of the sparsity pattern of input graphs and the compressed representation used to store graphs. The main insight of this thesis is that the different sources of irregularity in graph analytics also contain valuable information that can be used to design cache locality optimizations. Using this insight, we propose three types of optimizations that each leverage properties

of input graphs, compressed representations, and application access patterns to improve locality of graph

analytics workloads. First, we present Selective Graph Reordering and RADAR which are cache locality optimizations that leverage the structural properties of input graphs. Graph reordering uses a graph’s structure to improve the data layout for graph application data in a bid to improve locality. However, when accounting for overheads,

graph reordering offers questionable benefits; providing speedups for some graphs while causing a net slowdown in others. To improve the viability of graph reordering, we develop a low-overhead analytical model to accurately predict the performance improvement from reordering. Our analytical model allows selective application of graph reordering only for the graphs which are expected to receive high speedups while avoiding slowdowns for other graphs. RADAR builds upon graph reordering to perform memory efficient data duplication for power-law graphs to eliminate expensive atomic updates. Combining graph reordering with data duplication allows RADAR to simultaneously optimize cache locality and scalability of parallel graph applications.

Second, we present P-OPT, an optimized cache replacement policy that leverages the popular compressed

representation used in graph analytics – CSR and CSC – to improve locality for graph applications. Our work

is based on the observation that the CSR and CSC efficiently encode information about future accesses of

graph application data, enabling Belady’s optimal cache replacement policy (an oracular policy that represents

the theoretical upper bound in cache replacement). P-OPT is a practical implementation of Belady’s optimal replacement policy. By using the graph structure to guide near-optimal cache replacement, P-OPT is able to significantly reduce cache misses compared to heuristics-based, state-of-the-art replacement policies. Finally, we present HARP, a hardware-based cache locality optimization that leverages the typical

access pattern of graph analytics workloads. HARP builds upon Propagation Blocking, a software cache locality optimization targeting applications with irregular memory updates. Due to the pervasiveness of the irregular memory updates, Propagation Blocking applies to a broader range of workloads beyond just graph analytics. HARP provides architecture support for Propagation Blocking to eliminate the lingering sources of inefficiency in a Propagation Blocking execution, allowing HARP to further improve the performance gains from Propagation Blocking for graph analytics and other irregular memory workloads.

History

Date

2021-06-14

Degree Type

Dissertation

Department

Electrical and Computer Engineering

Degree Name

Doctor of Philosophy (PhD)

Advisor(s)

Brandon Lucia

Usage metrics

Keywords

atomics Cache Locality Optimization Cache Replacement Graph Analytics Graph Reordering Irregular Memory Accesses

Licence

In Copyright

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Input, Representation, and Access Pattern Guided Cache Locality Optimizations for Graph Analytics

History

Date

Degree Type

Department

Degree Name

Advisor(s)

Usage metrics

Categories

Keywords

Licence

Exports