A Visual Network Analysis Method for Large-Scale Parallel I/O Systems

Carmen Sigovan, Chris Muelder, Kwan-Liu Ma, Jason Cope, Kamil Iskra, Robert Ross
2013 2013 IEEE 27th International Symposium on Parallel and Distributed Processing  
Parallel applications rely on I/O to load data, store end results, and protect partial results from being lost to system failure. Parallel I/O performance thus has a direct and significant impact on application performance. Because supercomputer I/O systems are large and complex, one cannot directly analyze their activity traces. While several visual or automated analysis tools for large-scale HPC log data exist, analysis research in the high-performance computing field is geared toward
more » ... ion performance rather than I/O performance. Additionally, existing methods usually do not capture the network characteristics of HPC I/O systems. We present a visual analysis method for I/O trace data that takes into account the fact that HPC I/O systems can be represented as networks. We illustrate performance metrics in a way that facilitates the identification of abnormal behavior or performance problems. We demonstrate our approach on I/O traces collected from existing systems at different scales.
doi:10.1109/ipdps.2013.96 dblp:conf/ipps/SigovanMMCIR13 fatcat:6kdhf3ellvcgbjlnsp7mkqrrfi