Poster reception---Scalable compression and replay of communication traces in massively parallel environments

Michael Noeth, Jaydeep Marathe, Frank Mueller, Martin Schulz, Bronis de Supinski
2006 Proceedings of the 2006 ACM/IEEE conference on Supercomputing - SC '06  
Characterizing the communication behavior of largescale applications is a difficult and costly task due to code and system complexity as well as their long execution times. An alternative to running actual codes is to gather their communication traces and then replay them, which facilitates application tuning and future procurements. While past approaches lacked lossless scalable trace collection, we contribute an approach that provides near constant-size communication traces regardless of the
more » ... umber of nodes while preserving structural information. We introduce intra-and inter-node compression techniques of MPI events and present results of our implementation for BlueGene/L. Given this novel capability, we discuss its impact on communication tuning and beyond. To the best of our knowledge, near constant-size representation of MPI traces in a scalable manner combined with deterministic MPI call replay are without any precedence. ysis. Locally stored profiling files are constrained in size by the number of unique call sites of MPI events, which is independent of the number of nodes. However, mpiP does not preserve the structure and temporal ordering of events, which limits its use to high-level analysis. Other communication analysis tools fall into the former or later category, i.e., their storage requirements either do not scale or they are lossy with respect to program structure and temporal ordering.
doi:10.1145/1188455.1188605 dblp:conf/sc/NoethMMSS06 fatcat:zp2zc6dcv5f3xkw5a46cdxukya