SmartSim: Online Analytics and Machine Learning for HPC Simulations

Sam Partee, Matthew Ellis, Alessandro Rigazzi, Scott Bachman, Gustavo Marques, Andrew Shao
2021 Zenodo  
SmartSim is an open source library dedicated to enabling online analysis and Machine Learning (ML) for traditional High Performance Computing (HPC) simulations. SmartSim provides the ability for simulations written in C, C++, Fortran, and Python to call out to PyTorch, TorchScript, TensorFlow, and any model that supports the ONNX format (i.e. scikit-learn). In addition, the in-transit architecture of SmartSim enables simulation data streaming for online analysis, processing, and training. In
more » ... s talk we detail the SmartSim architecture and provide benchmarks including online inference and throughput on multiple Cray XC50 supercomputers. We will detail examples including how we used SmartSim to run a 12-member ensemble of global-scale, high-resolution ocean simulations, each spanning 19 compute nodes, all communicating with the same ML architecture at each simulation timestep. Lastly, we will present our plans for open source community involvement, and detail current development directions and research.
doi:10.5281/zenodo.4986181 fatcat:niu4n2jonjdufljji2ddrhvfjm