A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX
[article]
2020
arXiv
pre-print
The A64FX CPU powers the current number one supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for such a new architecture requires a good understanding of its performance features. Using these features, we construct the Execution-Cache-Memory (ECM) performance model for the A64FX processor in the FX700 supercomputer and validate it using streaming loops.
arXiv:2009.13903v1
fatcat:f5iikrcor5aurlwjpe4d74e3xy