Improved load distribution in parallel sparse cholesky factorization

Edward Rothberg, Robert Schreiber
1994 Supercomputing, Proceedings  
Compared to the customary column-oriented approaches, block-oriented, distributed-memory sparse Cholesky factorization benefits from an asymptotic reduction in interprocessor communication volume and an asymptotic increase in the amount of concurrency that is exposed in the problem. Unfortunately, blockoriented approaches (specifically, the block fan-out method) have suffered from poor balance of the computational load. As a result, achieved performance can be quite low. This paper investigates
more » ... the reasons for this load imbalance and proposes simple block mapping heuristics that dramatically improve it. The result is a roughly 20_o increase in realized parallel factorization performance, as demonstrated by performance results from an Intel Paragon TM system. We have achieved performance of nearly 3.2 billion floating point operations per second with this technique on a 196-node Paragon system.
doi:10.1145/602896.602897 fatcat:onxkoeq25jcxdovzqmxfxlsaui