A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Anatomy of high-performance matrix multiplication
2008
ACM Transactions on Mathematical Software
We present the basic principles that underlie the high-performance implementation of the matrixmatrix multiplication that is part of the widely used GotoBLAS library. Design decisions are justified by successively refining a model of architectures with multilevel memories. A simple but effective algorithm for executing this operation results. Implementations on a broad selection of architectures are shown to achieve near-peak performance.
doi:10.1145/1356052.1356053
fatcat:zauqgyyl4vc2hlgswnmr5pncri