Finding and Exploiting Parallelism with Data-Structure-Aware Static and Dynamic Analysis

Fallin

Christopher

2019

Maximizing performance on modern multicore hardware demands aggressive optimizations. Large amounts of legacy code are written for sequential hardware, and parallelization of this code is an important goal. Some programs are written for one parallel platform, but must be periodically updated for other platforms, or updated with the existing platform’s changing characteristics – for example, by splitting work at a different granularity or tiling work to fit in a cache. A programmer tasked with this work will likely refactor the code in ways that diverge from its original implementation’s step-by-step operation, but nevertheless computes correct results. Unfortunately, because modern compilers are unaware of the higher-level structure of a program, they are largely unable to perform this work automatically. First, they generally preserve the operation of the original program at the language-semantics level, down to implementation details. Thus parallelization transforms are often hindered by false dependencies between data-structure operations that are semantically commutative. For example, reordering (e.g.) two tree insertions would produce a different program heap even though the tree elements are identical. Second, by analyzing the program at this low level, they are generally unable to derive invariants at a higher level, such as that no two pointers in a particular list alias each other. Both of these shortcomings hinder modern parallelization analyses from reaching the performance that a human programmer can achieve by refactoring. In this thesis, we introduce an enhanced compiler and runtime system that can parallelize sequential loops in a program by reasoning about a program’s high-level semantics. We make three contributions. First, we enhance the compiler to reason about data structures as first-class values: operations on maps and lists, and traversals over them, are encoded directly. Semantic models map library data types that implement these standard containers to the IR intrinsics. This enables the compiler to reason about commutativity of various operations, and provides a basis for deriving data-structure invariants. Second, we present distinctness analysis, a type of static alias analysis that is specially designed to derive results for parallelization that are just as precise as needed, while remaining simple and widely applicable. This analysis discovers non-aliasing across loop iterations, which is a form of flow-sensitivity that nevertheless is much simpler than past work’s closed-form analysis of linear indexing of data structures. Finally, for cases when infrequent occurrences rule out a loop parallelization, or when static analysis cannot prove the parallelization to be safe, we leverage dynamic checks. Our hybrid static-dynamic approach extends the logic of our static alias analysis to find a set of checks to perform at runtime, and then uses the implications of these checks in its static reasoning. With the proper runtime techniques, these dynamic checks can be used to parallelize many loops without any speculation or rollback overhead. This system, which we have proven sound, performs significantly better than prior standard loopparallelization techniques. We argue that a compiler and runtime system with built-in data-structure primitives, simple but effective alias-analysis extensions, and some carefully-placed dynamic checks is a promising platform for macro-scale program transforms such as loop parallelization, and potentially many other lines of future work.