Compiling affine loop nests for distributed-memory parallel architectures

Uday Bondhugula
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
We present new techniques for compilation of arbitrarily nested loops with affine dependences for distributed-memory parallel architectures. Our framework is implemented as a source-level transformer that uses the polyhedral model, and generates parallel code with communication expressed with the Message Passing Interface (MPI) library. Compared to all previous approaches, ours is a significant advance either (1) with respect to the generality of input code handled, or (2) efficiency of
more » ... ation code, or both. We provide experimental results on a cluster of multicores demonstrating its effectiveness. In some cases, code we generate outperforms manually parallelized codes, and in another case is within 25% of it. To the best of our knowledge, this is the first work reporting end-to-end fully automatic distributed-memory parallelization and code generation for input programs and transformation techniques as general as those we allow.
doi:10.1145/2503210.2503289 dblp:conf/sc/Bondhugula13 fatcat:xgwsrbdygbar3hqi2rtyadu54e