Benefits of SMT and of Parallel Transpose Algorithm for the Large-Scale GYSELA Application

Guillaume Latu, Julien Bigot, Nicolas Bouzat, Judit Gimenez, Virginie Grandgirard
2016 Proceedings of the Platform for Advanced Scientific Computing Conference on - PASC '16  
This article describes how we manage to increase performance and to extend features of a large parallel application through the use of simultaneous multithreading (SMT) and by designing a robust parallel transpose algorithm. The semi-Lagrangian code Gysela typically performs large physics simulations using a few thousands of cores, between 1k cores up to 16k on x86-based clusters. However, simulations with finer resolutions and with kinetic electrons increase those needs by a huge factor,
more » ... ing a good example of applications requiring Exascale machines. To improve Gysela compute times, we take advantage of efficient SMT implementations available on recent INTEL architectures. We also analyze the cost of a transposition communication scheme that involves a large number of cores in our case. Adaptation of the code for balance load whenever using both SMT and good deployment strategy led to a significant reduction that can be up to 38% of the execution times.
doi:10.1145/2929908.2929912 fatcat:4ykib3ax6zb2jkzdog476p3itq