Supporting multiple accelerators in high-level programming models

Yonghong Yan, Pei-Hung Lin, Chunhua Liao, Bronis R. de Supinski, Daniel J. Quinlan
2015 Proceedings of the Sixth International Workshop on Programming Models and Applications for Multicores and Manycores - PMAM '15  
Computational accelerators, such as manycore NVIDIA GPUs, Intel Xeon Phi and FPGAs, are becoming common in workstations, servers and supercomputers for scientific and engineering applications. Efficiently exploiting the massive parallelism these accelerators provide requires the designs and implementations of productive programming models. In this paper, we explore support of multiple accelerators in high-level programming models. We design novel language extensions to OpenMP to support
more » ... ng data and computation regions to multiple accelerators (devices). These extensions allow for distributing data and computation among a list of devices via easy-to-use annotation interfaces, including specifying the distribution of multi-dimensional arrays and declaring shared data regions among accelerators. Computation distribution is realized by partitioning a loop iteration space among accelerators. We implement mechanisms to marshal/unmarshal and to move data of noncontiguous array subregions and shared regions between accelerators without involving CPUs. We design reduction techniques that work across multiple accelerators. Combined compiler and runtime support is designed to manage multiple GPUs using asynchronous operations and threading mechanisms. We implement our solutions for NVIDIA GPUs and demonstrate through example OpenMP codes the effectiveness of our solutions for the improvement of both performance and scalability.
doi:10.1145/2712386.2712405 dblp:conf/ppopp/0001LLSQ15 fatcat:3bkv56c7ojb2te3q7da32zjgde