Communication-optimal iterative methods

J Demmel, M Hoemmen, M Mohiyuddin, K Yelick
2009 Journal of Physics, Conference Series  
Data movement, both within the memory system of a single processor node and between multiple nodes in a system, limits the performance of many Krylov subspace methods that solve sparse linear systems and eigenvalue problems. Here, s iterations of algorithms such as CG, GMRES, Lanczos, and Arnoldi perform s sparse matrix-vector multiplications and Ω(s) vector reductions, resulting in a growth of Ω(s) in both single-node and network communication. By reorganizing the sparse matrix kernel to
more » ... e a set of matrix-vector products at once and reorganizing the rest of the algorithm accordingly, we can perform s iterations by sending O(log P ) messages instead of Ω(s•log P ) messages on a parallel machine, and reading the on-node components of the matrix A from DRAM to cache just once on a single node instead of s times. This reduces communication to the minimum possible. We discuss both algorithms and an implementation of GMRES on a single node of an 8-core Intel Clovertown. Our implementations achieve significant speedups over the conventional algorithms.
doi:10.1088/1742-6596/180/1/012040 fatcat:7h56tbf3sfgfnezaisyxkmwpbe