A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Implementation and Tuning of Batched Cholesky Factorization and Solve for NVIDIA GPUs
2016
IEEE Transactions on Parallel and Distributed Systems
Many problems in engineering and scientific computing require the solution of a large number of small systems of linear equations. Due to their high processing power, Graphics Processing Units became an attractive target for this class of problems, and routines based on the LU and the QR factorization have been provided by NVIDIA in the cuBLAS library. This work addresses the situation where the systems of equations are symmetric positive definite. The paper describes the implementation and
doi:10.1109/tpds.2015.2481890
fatcat:uglchqysozci3f6hgey373tphu