An out-of-core implementation of the COLUMBUS massively-parallel multireference configuration interaction program
Proceedings of the IEEE/ACM SC98 Conference
In this paper, we describe a novel parallelization approach we developed to solve the largest multireference configuration interaction (MRCI) problem ever attempted. From the mathematical perspective, the program solves the eigenvalue problem for a very large, sparse, symmetric Hamilton matrix. Using an out-of-core approach, shared memory programming model, improved data compression algorithms, and dynamic load balancing we were able to solve a problem six times larger than previously reported.
... The potential curve for the chromium dimer was calculated with a Hamilton matrix of dimension 1.3 billion (1, 295,937,374). This task involved moving 1.5 terabytes of data between main memory and secondary storage per MRCI iteration. Furthermore, by employing Active Messages and user-level striping to combine multiple files on local disks on the IBM SP into a single logically-shared file, the execution time of the program was reduced by a factor of three, as compared to our initial implementation on top of the IBM PIOFS parallel filesystem. Laboratory, Argonne National Laboratory, and the University of Vienna. Over a period of about seven years, the parallel efficiency of the program has increased from a paltry 50% on about 8 nodes of an Intel distributed-memory computer, to over 94% on 512 nodes of a CRAY-T3E. Most of this improvement in scalability results from extensive algorithmic modifications to expose greater parallelism, eliminate overhead from the parallel algorithm compared to the sequential one, eliminate disk I/O, and reduce communication and memory usage in part by using data compression. Until now the problem size that can be treated with this code  was, however, limited by the aggregate available memory to problems not much larger than have been done for some time using out-of-core techniques on vector supercomputers. Described in this paper is our implementation in the COLUMBUS MRCI program of an out-of-core approach that performs well on the IBM SP massively parallel computer. Since the program is implemented on top of portable programming tools and libraries it can be used on other platforms. We demonstrate the capability of this program by application to a problem six times larger than any previously reported.