Design and Implementation of a Service-integrated Session Layer for Efficient Message Passing in Grid Computing Environments

Carsten Clauss, Stefan Lankes, Thomas Bemmerl
2008 2008 International Symposium on Parallel and Distributed Computing  
When running large parallel applications with demands for resources that exceed the capacity the local computing site offers, the deployment in a distributed Grid environment may help to satisfy these demands. However, since such an environment is a heterogeneous system by nature, there are some drawbacks that, if not taken into account, are limiting its applicability. First of all, one has to apply a meta-computing or Grid-enabled message-passing library in order to have the ability to route
more » ... ability to route messages to remote sites as well as still being able to exploit fast site-local network facilities. Then, because the inter-site communication usually constitutes the system's bottleneck, appropriate quality of service parameters should be provided and policed for those connections during the application's execution. And finally, the parallel runtime environment of the distributed application should offer service interfaces in order to enable its interaction with Grid middleware. In this paper, we present a new library called ISI whose functionalities meet those requirements in terms of a session layer to be integrated into Grid-enabled message-passing implementations. Grid environments [9], can help to satisfy the resource demands such large applications desire. Advances in widearea networking technology have fostered this trend towards geographically distributed high-performance parallel computing in the recent years. However, as Grid resources are usually heterogeneous by nature, this is also true for the communication characteristics. Especially the inter-site communication often constitutes a bottleneck in terms of higher latencies and lower bandwidths than compared to the site-internal case. The reason for this is that the inter-site communication is typically handled via wide-area transport protocols and respective networks, whereas the internal communication is conducted via fast local-area networks or even via dedicated high-performance interconnections. That in turn means that an efficient utilization of such a hierarchical and heterogeneous infrastructure demands a communication middleware that provides support for all these different kinds of networks and transport protocols. Grid-enabled MPI Since MPI [18] is the most important API for implementing parallel programs for large-scale environments, also some MPI libraries have already been extended in order to meet these demands of distributed and heterogeneous computing. Those libraries are often called Grid-enabled because they do not only use plain TCP/IP (which is obviously the lowest common denominator) for all inter-process communication, but are also capable of exploiting fast but local networks and interconnect facilities in order to accommodate the Grid's hierarchy. Hence, for being able to provide support for the various highperformance cluster networks and their specific communication protocols, most of those libraries in turn rely on other high-level communication libraries (like site-native MPI libraries), rather than implementing this support inherently. Therefore, such a Grid-enabled MPI library can be understood as a kind of a meta-layer that bridges the distributed computing sites, and for that reason their application area is also referred to as a so-called meta-computing environment. The most common meta-computing and Gridenabled MPI libraries are MPICH-G2 [15], PACX-MPI [1], GridMPI [16], StaMPI [25] and MetaMPICH [21] , which are all proven to run large-scale applications in distributed environments. Although these meta-MPI implementations usually use native MPI support for site-internal communication, as for example provided by a site-local vendor MPI,
doi:10.1109/ispdc.2008.41 dblp:conf/ispdc/ClaussLB08 fatcat:dwqx5jk47jgtzcdpxstxky3lbq