Optimizing Metacomputing with Communication-Computation Overlap [chapter]

Françoise Baude, Denis Caromel, Nathalie Furmento, David Sagnol
2001 Lecture Notes in Computer Science  
In the framework of distributed object systems, this paper presents the concepts and an implementation of an overlapping mechanism between communication and computation. This mechanism allows to decrease the execution time of a remote method invocation with parameters of large size. Its implementation and related experiments in the C++// language running on top of Globus and Nexus are described. Formulation of the Problem The general idea featuring the concept of overlapping is that during a
more » ... ote computation dealing with large data requiring transmission, communication and computation are automatically split in steps with a smaller data volume; then, it is only a question of pipelining these steps in order to achieve overlapping between the current step of the remote computation and the data transmission related to the next step of the remote computation. This requires executing a computation and a transmission step at the same time. One way to achieve this is to use non-blocking communications. Schematically, in the SPMD or SIMD programming models, a similar computation has to be executed on each element of a large but fixed size data structure. So, the compiler or the run-time system is quite easily able to split it into small pieces, send each one in turn, apply the computation on each piece once it is received. If the compiler or the run-time system is not able to automatically decide how to split the data, the programmer can help. Thus, the implementation of this technique has generally been restricted to the field of data-parallel languages for parallel architectures with distributed memory: HPF [3], FortranD [17] , but also in LOCCS [8], a library for communication routines and computation. But, how should the same problem be tackled with, in the area of distributed object-oriented languages ? In this context, the whole computation taking place on the distributed entities can be expressed as remote service invocations through method calls as RMI [16] in Java or RPC in C/C++ [2], even if ultimately very low-level communications, e.g., network communications, are used. In order to exhibit parallelism between distributed computations, a solution is to use asynchronous -or non-blocking -service invocations instead of blocking ones as featured by classical RPCs. Many models and languages have exploited this idea [4] . In particular, we have designed and implemented distributed extensions to object-oriented languages such as Eiffel, C++ and Java, that enforce sequential code reuse in a parallel and distributed setting [6, 7] . In such languages extensions, each service invocation can be executed in parallel with the on-going computation. Once the result of the service is required, a wait-by-necessity mechanism comes to help [5] . More information related to this model will be given in Sect. 3. In the implementation of such remote method invocation-based settings, all arguments of the method call must generally be received before the method execution starts. Main idea. The essence of our proposition is thus to apply a classical pipelining idea to the arguments of a remote call: once the first part of the arguments has arrived, the method execution will be able to start. Moreover, it is only the type of the arguments that will automatically indicate how to split the data to send. In this way, programmers will be able to express, at a very high level, opportunities to introduce an overlapping of communications with computation operations. Optimisation of the parameter copying process, as in [18] is a different but complementary approach.
doi:10.1007/3-540-44743-1_19 fatcat:asnocez43fbtzl7twwsjkzxfwq