posted on 2009-11-01, 00:00authored bySiddhartha Chatterjee, Guy E. Blelloch, Allan L. Fisher
Abstract: "Data-parallel programming languages have many desirable features, such as single-thread semantics and the ability to express fine-grained parallelism. However, it is challenging to implement such languages efficiently on conventional MIMD multiprocessors, because these machines incur a high overhead for small grain sizes. This paper presents compile-time analysis techniques for data-parallel program graphs that reduce these overheads in two ways: by stepping up the grain size, and by relaxing the synchronous nature of the computation without altering the program semantics.The algorithms partition the program graph into clusters of nodes such that all nodes in a cluster have the same loop structure, and futher refine these clusters into epochs based on generation and consumption patterns of data vectors. This converts the fine-grain parallelism in the original program to medium-grain loop parallelism, which is better suited to MIMD machines. A compiler has been implemented based on these ideas. We present performance results for data-parallel kernels analyzed by the compiler and converted to single-program multiple-data (SPMD) code running on an Encore Multimax."