Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing
[article]
Jun Shirako, Akihiro Hayashi, Sri Raj Paul, Alexey Tumanov, Vivek Sarkar
2022
arXiv
pre-print
24 nodes and 144 GPUs in the OLCF Summit supercomputer for the Space-Time Adaptive Processing (STAP) radar application. ...
This paper introduces a novel approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. ...
Shirako, Hayashi, Paul, Tumanov, Sarkar hybrid Python/C++ code generation, fine-grained NumPy-to-CuPy conversion, and profile-based CPU/GPU runtime selection. ...
arXiv:2203.06233v1
fatcat:4e7sa6j3szgfri5pajrgccuvuu