MPI-based tools for large-scale training and optimization at HPC sites

Vladimir Loncar, Jean-Roch Vlimant, Sofia Vallecorsa, Gul Rukh Khattak, Maurizio Pierini, Thong Nguyen, Federico Carminati
2019 Zenodo  
MPI-learn and MPI-opt are libraries to perform large-scale training and hyper-parameter optimization for deep neural networks. The two libraries, based on Message Passing Interface, allows to perform these tasks on GPU clusters, through different kinds of parallelism. The main characteristic of these libraries is their flexibility: the user has complete freedom in building her own model, thanks to the multi-backend support. In addition, the library supports several cluster architectures,
more » ... chitectures, allowing a deployment on multiple platforms. This generality can make this the basis for a train & optimise service for the HEP community. We present scalability results obtained from two typical HEP use-case: jet identification from raw data and shower generation from a GAN model. Results on GPU clusters were obtained at the ORNL TITAN supercomputer ad other HPC facilities, as well as exploiting commercial cloud resources and OpenStack. A comprehensive comparisons of scalability performance across platforms will be presented, together with a detailed description of the libraries and their functionalities.
doi:10.5281/zenodo.3598748 fatcat:mkgv5k4lavedjhddtzi5pnhqc4