Implementation of Strassen's algorithm for matrix multiplication

Steven Huss-Lederman, Elaine M. Jacobson, Anna Tsao, Thomas Turnbull, Jeremy R. Johnson
1996 Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM) - Supercomputing '96  
In this paper we report on the development of an e cient and portable implementation of Strassen's matrix multiplication algorithm. Our implementation is designed to be used in place of DGEMM, the Level 3 BLAS matrix multiplication routine. E cient performance will be obtained for all matrix sizes and shapes and the additional memory needed for temporaryvariables has beenminimized. Replacing DGEMM with our routine should provide a signi cant performance gain for large matrices while providing
more » ... e same performance for small matrices. We measure performance of our code on the IBM RS/6000, CRAY YMP C90, and CRAY T3D single processor, and o er comparisons to other codes. Our performance data recon rms that Strassen's algorithm is practical for realistic size matrices. The usefulness of our implementation is demonstrated by replacing DGEMM with our routine in a large application code.
doi:10.1145/369028.369096 fatcat:hls63elv5ngcbjpxe6yr7xoxni