Pretty Good Accuracy in Matrix Multiplication with GPUs

Matthew Badin, Lubomir Bic, Michael Dillencourt, Alexandru Nicolau
2010 2010 Ninth International Symposium on Parallel and Distributed Computing  
With systems such as Road Runner, there is a trend in super computing to offload parallel tasks to special purpose co-processors, composed of many relatively simple scalar processors. The cheaper commodity class equivalent of such a processor would be the graphics card, potentially offering super computer power within the confines of a desktop PC. Graphics cards however are not without problems, these range from the lack of double precision on most cards to a fairly steep drop in performance
more » ... p in performance for using double precision on others, the end result being that in order to utilize the graphics card the computation must be done using single precision. In this paper we propose a method whereby a whole digit of the accuracy lost in single precision matrix multiply can be regained with only a 7% loss in performance by applying a compensated summation algorithm in a manner previously unexplored, a manner in which, at first glance, shouldn't provide any benefit but empirical evidence will show that though the novel idea is simple, provides unexpected benefits in terms of accuracy at little cost to performance.
doi:10.1109/ispdc.2010.12 dblp:conf/ispdc/BadinBDN10 fatcat:5ggvtf5ai5defpwleykhlsds4e