Automatic Parallelization of Python Programs for Distributed Heterogeneous Computing
24 nodes and 144 GPUs in the OLCF Summit supercomputer for the Space-Time Adaptive Processing (STAP) radar application. ...
This paper introduces a novel approach to automatic ahead-of-time (AOT) parallelization and optimization of sequential Python programs for execution on distributed heterogeneous platforms. ...
Shirako, Hayashi, Paul, Tumanov, Sarkar hybrid Python/C++ code generation, fine-grained NumPy-to-CuPy conversion, and profile-based CPU/GPU runtime selection. ...