Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX [article]

Christie L. Alappat, Jan Laukemann, Thomas Gruber, Georg Hager, Gerhard Wellein, Nils Meyer, Tilo Wettig
2020 arXiv   pre-print
The A64FX CPU powers the current number one supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, we construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops.
more » ... also identify architectural peculiarities and derive optimization hints. Applying the ECM model to sparse matrix-vector multiplication (SpMV), we motivate why the CRS matrix storage format is inappropriate and how the SELL-C-sigma format with suitable code optimizations can achieve bandwidth saturation for SpMV.
arXiv:2009.13903v1 fatcat:f5iikrcor5aurlwjpe4d74e3xy