Characterizing the Performance of Modern Architectures Through Opaque Benchmarks: Pitfalls Learned the Hard Way

Luka Stanisic, Lucas Mello Schnorr, Augustin Degomme, Franz C. Heinrich, Arnaud Legrand, Brice Videau
2017 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)  
Determining key characteristics of High Performance Computing machines that allow users to predict their performance is an old and recurrent dream. This was, for example, the rationale behind the design of the LogP model that later evolved into many variants (LogGP, LogGPS, LoGPS, . . . ) to cope with the evolution and complexity of network technology. Although the network has received a lot of attention, predicting the performance of computation kernels can be very challenging as well. In
more » ... ng as well. In particular, the tremendous increase of internal parallelism and deep memory hierarchy in modern multi-core architectures often limits applications by the memory access rate. In this context, determining the key characteristics of a machine such as the peak bandwidth of each cache level as well as how an application uses such memory hierarchy can be the key to predict or to extrapolate the performance of applications. Based on such performance models, most high-level simulation-based frameworks separately characterize a machine and an application, later convolving both signatures to predict the overall performance. We evaluate the suitability of such approaches to modern architectures and applications by trying to reproduce the work of others. When trying to build our own framework, we realized that, regardless of the quality of the underlying models or software, most of these frameworks rely on "opaque" benchmarks to characterize the platform. In this article, we report the many pitfalls we encountered when trying to characterize both the network and the memory performance of modern machines. We claim that opaque benchmarks that do not clearly separate experiment design, measurements, and analysis should be avoided as much as possible in a modeling context. Likewise, an a priori identification of experimental factors should be done to make sure the experimental conditions are adequate. Single-Processor Model Characterization of memory performance capabilities Machine M MAPS Characterization of memory operations needed to be performed Application A MetaSim Tracer Communication Model Characterization of network performance capabilities Machine M PMB Characterization of network operations needed to be performed Application A MPIDtrace Mapping memory usage needs of Application A to the capabilities of Machine M Convolution Method MetaSim Convolver Mapping network usage needs of Application A to the capabilities of Machine M Convolution Method DIMEMAS Performance Prediction of parallel application A on Machine M
doi:10.1109/ipdpsw.2017.125 dblp:conf/ipps/StanisicSDHLV17 fatcat:gtwqzmmreffutlmzqyhvt4xi7q