pMR: A high-performance communication library [article]

Peter Georg, Daniel Richtmann, Tilo Wettig
2017 arXiv   pre-print
On many parallel machines, the time LQCD applications spent in communication is a significant contribution to the total wall-clock time, especially in the strong-scaling limit. We present a novel high-performance communication library that can be used as a de facto drop-in replacement for MPI in existing software. Its lightweight nature that avoids some of the unnecessary overhead introduced by MPI allows us to improve the communication performance of applications without any algorithmic or
more » ... licated implementation changes. As a first real-world benchmark, we make use of the pMR library in the coarse-grid solve of the Regensburg implementation of the DD-αAMG algorithm. On realistic lattices, we see an improvement of a factor 2x in pure communication time and total execution time savings of up to 20
arXiv:1701.08521v1 fatcat:wv3hhyswyjefze3pkvydobjise