Hands-on Performance Tuning of 3D Finite Difference Earthquake Simulation on GPU Fermi Chipset

Jun Zhou, Didem Unat, Dong Ju Choi, Clark C. Guest, Yifeng Cui
2012 Procedia Computer Science  
3D simulation of earthquake ground motion is one of the most challenging computational problems in science. The emergence of graphic processing units (GPU) as an effective alternative to traditional general purpose processors has become increasingly capable in terms of accelerating scientific computing research. In this paper, we describe our experiences in porting AWP-ODC, a 3D finite difference seismic wave propagation code, to the latest GPU Fermi chipset. We completely rewrote this
more » ... ased 13-point asymmetric stencil computation code in C and MPI-CUDA in order to take advantage of the powerful GPU computing capabilities. Our new CUDA code implemented the asymmetric 3D stencil on Fermi to make the best use of GPU on-chip memory for an aggressive parallel efficiency. Benchmark on NVIDIA Tesla M2090 demonstrated 10x speedup versus the original fully optimized AWP-ODC FORTRAN MPI code running on a single Intel Nehalem 2.4 GHz CPU socket (4 cores/CPU), and 15x speedup versus the same MPI code running on a single AMD Istanbul 2.6GHz CPU socket (6 cores/CPU). Sustained single-GPU performance of 143.8 GFLOPS in single precision is benchmarked for the testing case of 128x128x960 mesh size.
doi:10.1016/j.procs.2012.04.104 fatcat:zosor67murcbxahkgcy6ydgkjq