posted on 2005-06-01, 00:00authored byGuy E. Blelloch, Phillip B Gibbons, Harsha Vardhan Simhadri
In this paper we explore a simple and general approach for developing parallel algorithms that lead to good
cache complexity on a variety of parallel cache architectures. The approach is to design nested parallel
algorithms that have low depth (span, critical path length) and for which the natural sequential evaluation
order has low cache complexity in the cache-oblivious model. We describe several cache-oblivious algorithms
with optimal work, polylogarithmic depth, and sequential cache complexities that match the best
sequential algorithms, including the first such algorithms for sorting and for sparse-matrix vector multiply
on matrices with good vertex separators. Our sorting algorithm yields the first cache-oblivious algorithms
with polylogarithmic depth and low sequential cache complexities for list ranking, Euler tour tree labeling,
tree contraction, least common ancestors, graph connectivity, and minimum spanning forest.
Using known mappings, our results lead to low cache complexities on multi-core processors (and shared memory
multiprocessors) with a single level of private caches or a single shared cache. We generalize these
mappings to a multi-level parallel tree-of-caches model that reflects current and future trends in multi-core
cache hierarchies—these new mappings imply that our algorithms also have low cache complexities on
such hierarchies. The key factor in obtaining these low parallel cache complexities is the low depth of the
algorithms we propose.