Added Concurrency to Improve MPI Performance on Multicore

Humaira Kamal, Alan Wagner
2012 2012 41st International Conference on Parallel Processing  
MPI implementations typically equate an MPI process with an OS-process, resulting in a coarse-grain programming model where MPI processes are bound to the physical cores. Fine-Grain (FG-MPI) extends the MPICH2 implementation of MPI and implements an integrated runtime system to allow multiple MPI processes to execute concurrently inside an OS-process. FG-MPI's integrated approach makes it possible to add more concurrency than available parallelism, while minimizing the overheads related to
more » ... xt switches, scheduling and synchronization. In this paper we evaluate the benefits of added concurrency for cache awareness and message size and show that performance gains are possible by using FG-MPI to adjust the grain-size of a program to better fit the cache and potential advantages in passing smaller versus larger messages. We evaluate the use of FG-MPI on the complete set of the NAS parallel benchmarks over large problem sizes, where we show significant performance improvement (20%-30%) for three of the eight benchmarks. We discuss the characteristics of the benchmarks with regards to trade-offs between the added costs and benefits.
doi:10.1109/icpp.2012.15 dblp:conf/icpp/KamalW12 fatcat:2pkukyyi75cxnc7zm3frxrbi34