A study of the effects of machine geometry and mapping on distributed transpose performance

Maria Eleftheriou, Blake G. Fitch, Aleksandr Rayshubskiy, T.J. Christopher Ward, Phillip Heidelberger, Robert S. Germain
2008 Proceedings of the 2008 conference on Computing frontiers - CF '08  
This paper describes a parallel strategy to extend the scalability of a small 3D FFT on thousands of Blue Gene/L processors. The approach is to execute the intermediate phases of the 3D FFT on smaller processor subsets. Performance measurements of the standalone 3D FFT on two communication protocols, MPI and BG/L ADE [19] are presented. While the performance of the 3D-FFT with MPI-based and BG/L ADE-based implementations exhibited qualitatively similar behavior, the BG/L ADE-based version has
more » ... wer communication cost than the MPI based version for small message sizes. Measurements also show that the proposed approach is effective in improving Particle-Mesh-based Nbody simulation performance significantly at the limits of scalability.
doi:10.1145/1366230.1366243 dblp:conf/cf/EleftheriouFRWHG08 fatcat:azncncvaqvbovj5lc5zyk3vmym