GPU-based parallel householder bidiagonalization

Fangbin Liu, Frank J. Seinstra
2010 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10  
In this paper, we discuss the GPU-based implementation and optimization of Householder bidiagonalization, a matrix factorization method which is an integral part of full Singular Value Decomposition (SVD) -an important algorithm for many problems in the research domain of Multimedia Content Analysis (MMCA). On cluster computers, complex adaptive run-time techniques often must be implemented to overcome the growing negative performance impact of load imbalances and to ensure reasonable speedup.
more » ... e show that the nature of the many-core platform can avoid the necessity of applying such complex run-time parallelization techniques in software while achieving a performance of 64 gigaflops/s on a single-GPU GTX 295 in double precision, 82% of the theoretical peak performance.
doi:10.1145/1851476.1851512 dblp:conf/hpdc/LiuS10 fatcat:s5wfunigwrf3jpmb4sy3ppuivu