Finding and Exploiting Parallelism with Data-Structure-Aware Static and Dynamic Analysis

2019-03-04T20:13:09Z (GMT) by Christopher Fallin
Maximizing performance on modern multicore hardware demands aggressive optimizations. Large amounts<br>of legacy code are written for sequential hardware, and parallelization of this code is an important goal. Some<br>programs are written for one parallel platform, but must be periodically updated for other platforms, or updated with the existing platform’s changing characteristics – for example, by splitting work at a different granularity or tiling work to fit in a cache. A programmer tasked with this work will likely refactor the code in ways that diverge from its original implementation’s step-by-step operation, but nevertheless computes correct results. Unfortunately, because modern compilers are unaware of the higher-level structure of a program, they are largely unable to perform this work automatically. First, they generally preserve the operation of the original program at the language-semantics level, down to implementation details. Thus parallelization<br>transforms are often hindered by false dependencies between data-structure operations that are semantically<br>commutative. For example, reordering (e.g.) two tree insertions would produce a different program heap even<br>though the tree elements are identical. Second, by analyzing the program at this low level, they are generally<br>unable to derive invariants at a higher level, such as that no two pointers in a particular list alias each other.<br>Both of these shortcomings hinder modern parallelization analyses from reaching the performance that a<br>human programmer can achieve by refactoring.<br>In this thesis, we introduce an enhanced compiler and runtime system that can parallelize sequential loops<br>in a program by reasoning about a program’s high-level semantics. We make three contributions. First, we<br>enhance the compiler to reason about data structures as first-class values: operations on maps and lists, and<br>traversals over them, are encoded directly. Semantic models map library data types that implement these<br>standard containers to the IR intrinsics. This enables the compiler to reason about commutativity of various<br>operations, and provides a basis for deriving data-structure invariants. Second, we present distinctness<br>analysis, a type of static alias analysis that is specially designed to derive results for parallelization that are<br>just as precise as needed, while remaining simple and widely applicable. This analysis discovers non-aliasing<br>across loop iterations, which is a form of flow-sensitivity that nevertheless is much simpler than past work’s<br>closed-form analysis of linear indexing of data structures. Finally, for cases when infrequent occurrences rule<br>out a loop parallelization, or when static analysis cannot prove the parallelization to be safe, we leverage<br>dynamic checks. Our hybrid static-dynamic approach extends the logic of our static alias analysis to find a<br>set of checks to perform at runtime, and then uses the implications of these checks in its static reasoning.<br>With the proper runtime techniques, these dynamic checks can be used to parallelize many loops without<br>any speculation or rollback overhead. This system, which we have proven sound, performs significantly better than prior standard loopparallelization techniques. We argue that a compiler and runtime system with built-in data-structure primitives, simple but effective alias-analysis extensions, and some carefully-placed dynamic checks is a promising<br>platform for macro-scale program transforms such as loop parallelization, and potentially many other lines of future work.<br>