CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines

Albert Chan, Frank Dehne, Ryan Taylor
2005 The international journal of high performance computing applications  
In this paper, we present CGMgraph, the first integrated library of parallel graph methods for PC clusters based on Coarse Grained Multicomputer (CGM) algorithms. CGMgraph implements parallel methods for various graph problems. Our implementations of deterministic list ranking, Euler tour, connected components, spanning forest, and bipartite graph detection are, to our knowledge, the first efficient implementations for PC clusters. Our library also includes CGMlib, a library of basic CGM tools
more » ... uch as sorting, prefix sum, one-to-all broadcast, all-to-one gather, h-Relation, all-to-all broadcast, array balancing, and CGM partitioning. Both libraries are available for download at http://www.scs.carleton.ca/~cgm. In the experimental part of this paper, we demonstrate the performance of our methods on four different architectures: a gigabit connected high performance PC cluster, a smaller PC cluster connected via fast ethernet, a network of workstations, and a shared memory machine. Our experiments show that our library provides good parallel speedup and scalability on all four platforms. The communication overhead is, in most cases, small and does not grow significantly with an increasing number of processors. This is a very important feature of CGM algorithms which makes them very efficient in practice. Introduction In this paper, we present CGMgraph, the first integrated library of Coarse Grained Multicomputer (CGM; Dehne et al. 1993) methods for graph problems including list ranking, Euler tour, connected components, spanning forest, and bipartite graph recognition. Our library also includes a library CGMlib of basic CGM tools that are necessary for parallel graph methods as well as many other CGM algorithms: sorting, prefix sum, one-to-all broadcast, all-to-one gather, h-Relation, all-to-all broadcast, array balancing, and CGM partitioning. In comparison with Guérin Lassous et al. (2000) , CGMgraph implements both a randomized as well as a deterministic list ranking method. Our experimental results for randomized list ranking are similar to those reported in Guérin Lassous et al. (2000). Our implementations of deterministic list ranking, Euler tour, connected components, spanning forest, and bipartite graph recognition are, to our knowledge, the first efficient implementations for PC clusters. CGMgraph and CGMlib are based on the CGM/BSP model (Valiant 1990; Dehne et al. 1993) and are optimized for PC clusters. In the experimental part of this paper, we show the performance of our methods on four different architectures: THOG, CGM1, ULTRA and SUNFIRE. The THOG cluster is a gigabit connected high performance cluster, CGM1 is a smaller cluster connected via fast ethernet, ULTRA is a network of workstations, and SUNFIRE is a shared memory cluster. Our experiments show that our library provides good relative parallel speedup and scalability on all four platforms. The communication overhead is, in most cases, small and does not grow significantly with an increasing number of processors. This is a very important feature of CGM algorithms, which makes them very efficient in practice. The communication overhead is, in most cases, dominated by the local computation time, which implies good relative speedup in practice. Both the CGMlib and CGMgraph libraries are freely available for download at http://www.scs.carleton.ca/ cgm, together with a library installation script. The aim of these libraries is to make efficient parallel graph methods available to a wider community of researchers who can utilize them as building blocks for other parallel programming projects (Chan and Dehne 2003).
doi:10.1177/1094342005051196 fatcat:274ehzuir5d3djgdy2zfogxspy