Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application

Sreeram Potluri, Dhabhaleswar K. Panda, Ping Lai, Karen Tomko, Sayantan Sur, Yifeng Cui, Mahidhar Tatineni, Karl W. Schulz, William L. Barth, Amitava Majumdar
2010 Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10  
AWM-Olsen is a widely used ground motion simulation code based on a parallel finite difference solution of the 3-D velocitystress wave equation. This application runs on tens of thousands of cores and consumes several million CPU hours on the TeraGrid Clusters every year. A significant portion of its run-time (37% in a 4,096 process run), is spent in MPI communication routines. Hence, it demands an optimized communication design coupled with a low-latency, high-bandwidth network and an
more » ... communication subsystem for good performance. In this paper, we analyze the performance bottlenecks of the application with regard to the time spent in MPI communication calls. We find that much of this time can be overlapped with computation using MPI non-blocking calls. We use both two-sided and MPI-2 one-sided communication semantics to re-design the communication in AWM-Olsen. We find that with our new design, using MPI-2 one-sided communication semantics, the entire application can be sped up by 12% at 4K processes and by 10% at 8K processes on a state-of-the-art InfiniBand cluster, Ranger at the Texas Advanced Computing Center (TACC).
doi:10.1145/1810085.1810092 dblp:conf/ics/PotluriLTSCTSBMP10 fatcat:el2ik747ffhlbjtwu3w6fkdl6y