Input, Representation, and Access Pattern Guided Cache Locality Optimizations for Graph Analytics
thesisposted on 01.03.2022, 21:50 authored by Vignesh BalajiVignesh Balaji
Graph analytics has many important commercial and scientific applications ranging from social network analysis to tracking disease transmission using contact networks. Early works in graph analytics primarily focused on processing graphs using large-scale distributed systems. More recently, increasing main memory capacities and core counts have prompted a shift towards analyzing large graphs using just a single machine. Multiple studies have demonstrated that in-memory graph analytics can even outperform distributed graph analytics. However, performance of in-memory graph analytics is still far from optimal because the characteristic irregular memory access pattern of graph applications leads to poor cache locality. Irregular memory accesses are fundamental to graph analytics and are a product of the sparsity pattern of input graphs and the compressed representation used to store graphs. The main insight of this thesis is that the different sources of irregularity in graph analytics also contain valuable information that can be used to design cache locality optimizations. Using this insight, we propose three types of optimizations that each leverage properties
of input graphs, compressed representations, and application access patterns to improve locality of graph
analytics workloads. First, we present Selective Graph Reordering and RADAR which are cache locality optimizations that leverage the structural properties of input graphs. Graph reordering uses a graph’s structure to improve the data layout for graph application data in a bid to improve locality. However, when accounting for overheads,
graph reordering offers questionable benefits; providing speedups for some graphs while causing a net slowdown in others. To improve the viability of graph reordering, we develop a low-overhead analytical model to accurately predict the performance improvement from reordering. Our analytical model allows selective application of graph reordering only for the graphs which are expected to receive high speedups while avoiding slowdowns for other graphs. RADAR builds upon graph reordering to perform memory efficient data duplication for power-law graphs to eliminate expensive atomic updates. Combining graph reordering with data duplication allows RADAR to simultaneously optimize cache locality and scalability of parallel graph applications.
Second, we present P-OPT, an optimized cache replacement policy that leverages the popular compressed
representation used in graph analytics – CSR and CSC – to improve locality for graph applications. Our work
is based on the observation that the CSR and CSC efficiently encode information about future accesses of
graph application data, enabling Belady’s optimal cache replacement policy (an oracular policy that represents
the theoretical upper bound in cache replacement). P-OPT is a practical implementation of Belady’s optimal replacement policy. By using the graph structure to guide near-optimal cache replacement, P-OPT is able to significantly reduce cache misses compared to heuristics-based, state-of-the-art replacement policies. Finally, we present HARP, a hardware-based cache locality optimization that leverages the typical
access pattern of graph analytics workloads. HARP builds upon Propagation Blocking, a software cache locality optimization targeting applications with irregular memory updates. Due to the pervasiveness of the irregular memory updates, Propagation Blocking applies to a broader range of workloads beyond just graph analytics. HARP provides architecture support for Propagation Blocking to eliminate the lingering sources of inefficiency in a Propagation Blocking execution, allowing HARP to further improve the performance gains from Propagation Blocking for graph analytics and other irregular memory workloads.
DepartmentElectrical and Computer Engineering
- Doctor of Philosophy (PhD)