A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Reduced-Bandwidth Multithreaded Algorithms for Sparse Matrix-Vector Multiplication
2011
2011 IEEE International Parallel & Distributed Processing Symposium
On multicore architectures, the ratio of peak memory bandwidth to peak floating-point performance (byte:flop ratio) is decreasing as core counts increase, further limiting the performance of bandwidth limited applications. Multiplying a sparse matrix (as well as its transpose in the unsymmetric case) with a dense vector is the core of sparse iterative methods. In this paper, we present a new multithreaded algorithm for the symmetric case which potentially cuts the bandwidth requirements in half
doi:10.1109/ipdps.2011.73
dblp:conf/ipps/BulucWOD11
fatcat:37gp3czbwzaqflrypenxnboyz4