An Optimized Sparse Approximate Matrix Multiply for Matrices with Decay

Nicolas Bock, Matt Challacombe
2013 SIAM Journal on Scientific Computing  
We present an optimized single-precision implementation of the Sparse Approximate Matrix Multiply (SpAMM) N. Bock, arXiv 1011.3534 (2010)], a fast algorithm for matrixmatrix multiplication for matrices with decay that achieves an O (n log n) computational complexity with respect to matrix dimension n. We find that the max norm of the error achieved with a SpAMM tolerance below 2 × 10 −8 is lower than that of the single-precision SGEMM for dense quantum chemical matrices, while outperforming
more » ... M with a cross-over already for small matrices (n ∼ 1000). Relative to naive implementations of SpAMM using Intel's Math Kernel Library (MKL) or AMD's Core Math Library (ACML), our optimized version is found to be significantly faster. Detailed performance comparisons are made for quantum chemical matrices with differently structured sub-blocks. Finally, we discuss the potential of improved hardware prefetch to yield 2-3x speedups.
doi:10.1137/120870761 fatcat:o3v2jvavyrb7nlsdmxy3n7jcce