Hierarchical Approach for Deriving a Reproducible LU factorization Hierarchical Approach for Deriving a Reproducible unblocked LU factorization

Roman Iakymchuk, Stef Graillat, David Defour, Enrique Quintana-Ortí, Roman Iakymchuk, Stef Graillat, David Defour, Enrique Quintana-Ortí
unpublished
We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the
more » ... factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.
fatcat:s4ybmvimdrdjfivtxbpaa4kvay