Nonuniformly Communicating Noncontiguous Data: A Case Study with PETSc and MPI
2007 IEEE International Parallel and Distributed Processing Symposium
Due to the complexity associated with developing parallel applications, scientists and engineers rely on highlevel software libraries such as PETSc, ScaLAPACK and PESSL to ease this task. Such libraries assist application developers by providing abstractions for mathematical operations, data representation and management of parallel layouts of the data, while internally using communication libraries such as MPI and PVM. With high-level libraries managing data layout and communication
... nication internally, it can be expected that they organize application data suitably for performing the library operations optimally. However, this places additional overhead on the underlying communication library by making the data layout noncontiguous in memory and communication volumes (data transferred to each process) nonuniform. In this paper, we analyze the overheads associated with these two aspects (noncontiguous data layouts and nonuniform communication volumes) in the context of the PETSc software toolkit over the MPI communication library. We describe the issues with the current approaches used by MPICH2 (an implementation of MPI), propose different approaches to handle these issues and evaluate these approaches with microbenchmarks as well as an application over the PETSc software library. Our experimental results demonstrate close to an order of magnitude improvement in the per- * The authors would like to thank Dr. Panda and his team from the Ohio State University for allowing us access to their 64-node InfiniBand cluster. formance of a 3-D Laplacian multi-grid solver application when evaluated on a 128 processor cluster.