A generalized processor mapping technique for array redistribution
IEEE Transactions on Parallel and Distributed Systems
AbstractÐIn many scientific applications, array redistribution is usually required to enhance data locality and reduce remote memory access in many parallel programs on distributed memory multicomputers. Since the redistribution is performed at runtime, there is a performance trade-off between the efficiency of the new data decomposition for a subsequent phase of an algorithm and the cost of redistributing data among processors. In this paper, we present a generalized processor mapping
... to minimize the amount of data exchange for BLOCK-CYCLIC(kr) to BLOCK-CYCLIC(r) array redistribution and vice versa. The main idea of the generalized processor mapping technique is first to develop mapping functions for computing a new rank of each destination processor. Based on the mapping functions, a new logical sequence of destination processors can be derived. The new logical processor sequence is then used to minimize the amount of data exchange in a redistribution. The generalized processor mapping technique can handle array redistribution with arbitrary source and destination processor sets and can be applied to multidimensional array redistribution. We present a theoretical model to analyze the performance improvement of the generalized processor mapping technique. To evaluate the performance of the proposed technique, we have implemented the generalized processor mapping technique on an IBM SP2 parallel machine. The experimental results show that the generalized processor mapping technique can provide performance improvement over a wide range of redistribution problems.