Performance Modeling of Gyrokinetic Toroidal Simulations for a Many-Tasking Runtime System [chapter]

Matthew Anderson, Maciej Brodowicz, Abhishek Kulkarni, Thomas Sterling
2014 Lecture Notes in Computer Science  
Conventional programming practices on multicore processors in high performance computing architectures are not universally effective in terms of efficiency and scalability for many algorithms in scientific computing. One possible solution for improving efficiency and scalability in applications on this class of machines is the use of a many-tasking runtime system employing many lightweight, concurrent threads. Yet a priori estimation of the potential performance and scalability impact of such
more » ... ntime systems on existing applications developed around the bulk synchronous parallel (BSP) model is not well understood. In this work, we present a case study of a BSP particle-in-cell benchmark code which has been ported to a many-tasking runtime system. The 3-D Gyrokinetic Toroidal code (GTC) is examined in its original MPI form and compared with a port to the High Performance ParalleX 3 (HPX-3) runtime system. Phase overlap, oversubscription behavior, and work rebalancing in the implementation are explored. Results for GTC using the SST/macro simulator complement the implementation results. Finally, an analytic performance model for GTC is presented in order to guide future implementation efforts.
doi:10.1007/978-3-319-10214-6_7 fatcat:rpvyc7r6dbcvte6dm4q74cbl4i