Carnegie Mellon University
Blanco_cmu_0041E_10967.pdf (4.05 MB)

Hardware-Software Implementations of High Performance Graph Algorithms

Download (4.05 MB)
posted on 2023-05-02, 18:42 authored by Maia Blanco

Graph analytics have seen increased attention in science and industry as a tool to gain useful insights from relational data. However, techniques and principles for high performance computing have traditionally focused on dense, regular problems. As a result, graph algorithms have retained a reputation for being ‘irregular’ and difficult for which to design implementations that perform well, i.e., that make efficient use of existing architectures. In this dissertation, we demonstrate that careful analysis and co-design of graph algorithm implementations for modern processor architectures provides improvements in performance whose underlying principles and techniques can be understood and explained at the level of the hardware and software interacting. 

Through algorithms across two families of graph processing workloads, we demonstrate that careful attention to the mapping of graph operations on functional units, cores, and memory hierarchy improves performance and understanding of how future algorithms, compilers, and architectures should be designed. In our work on motif-finding algorithms such as Triangle Counting and K-Truss, we demonstrate 1) the performance-improving mechanism of graph sorting and why superficially inefficient use of SIMD for set intersections still improves performance, and 2) the effect of careful task sizing and granularity of parallelism on many-thread processors. In graph traversal workloads such as Page Rank, Single Source Shortest Paths (SSSP), and Breadth First Search (BFS), we demonstrate that 1) graph traversal primitives in the form of matrixvector operations have a large capacity for memory level parallelism, but only if designed to overlap functional unit and memory latency and 2) inter-core cache coherence penalties in asynchronous parallel implementations can be mitigated by intentionally delaying those asynchronous updates. The application of our co-design methodology across two families of graph processing workloads enables better performance in existing graph processing frameworks and provides increased insight to the designers of future algorithms, compilers, and architectures. 




Degree Type

  • Dissertation


  • Electrical and Computer Engineering

Degree Name

  • Doctor of Philosophy (PhD)


Tze Meng Low

Usage metrics


    Ref. manager