Optimizing legacy molecular dynamics software with directive-based offload

W. Michael Brown, Jan-Michael Y. Carrillo, Nitin Gavhane, Foram M. Thakkar, Steven J. Plimpton
2015 Computer Physics Communications  
Directive-based programming models are one solution for exploiting many-core coprocessors to increase simulation rates in molecular dynamics. They offer the potential to reduce code complexity with offload models that can selectively target computations to run on the CPU, the coprocessor, or both. In this paper, we describe modifications to the LAMMPS molecular dynamics code to enable concurrent calculations on a CPU and coprocessor. We demonstrate that standard molecular dynamics algorithms
more » ... run efficiently on both the CPU and a x86-based coprocessor using the same subroutines. As a consequence, we demonstrate that code optimizations for the coprocessor also result in speedups on the CPU; in extreme cases up to 4.7X. We provide results for LAMMPS benchmarks and for production molecular dynamics simulations using the Stampede hybrid supercomputer with both Intel R Xeon Phi TM coprocessors and Nvidia GPUs. The optimizations presented have increased simulation rates by over 2X for organic molecules and over 7X for liquid crystals on Stampede. The optimizations are available as part of the "Intel package" supplied with LAMMPS. (W. Michael Brown), carrillojy@ornl.gov (Jan-Michael Y. Carrillo), Nitin.Gavhane@shell.com (Nitin Gavhane), Foram.Thakkar@shell.com (Foram M. Thakkar), sjplimp@sandia.gov (Steven J. Plimpton) modifications are necessary in order to efficiently use the system. In our previous work, we have focused on the design of efficient algorithms to use GPU accelerators for large-scale molecular dynamics (MD) simulations [1, 2, 3, 4] . For this work, a separate library was designed for the LAMMPS molecular dynamics software [5] with MD algorithms modified to run efficiently on GPUs. This library could be compiled using either CUDA or OpenCL. Although this approach has allowed for GPU-acceleration in production simulations, the use of a separate programming language and different algorithms on the CPU and GPU introduces additional code complexity and requires optimization of separate code paths depending on the target. For example, in MD, redundant calculation is typically used to avoid memory conflicts for force updates on the GPU
doi:10.1016/j.cpc.2015.05.004 fatcat:xbyof4cbcngi7cv5damtslie2a