Transforming high-level data-parallel programs into vector operations

Jan F. Prins, Daniel W. Palmer
1993 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming - PPOPP '93  
Fully-parallel execution of a high-level data-parallel language based on nested sequences, higher order functions and generalized iterators can be realized in the vector model using a suitable representation of nested sequences and a small set of transformational rules to distribute iterators through the constructs of the language. † This work supported in part by DARPA/ISTO Contract N00014-91-C-0114. 1 The Proteus language is a component of the DARPA CPL (Common Prototyping Language) effort.
more » ... -foundation for APL was described by [More79] , and can be found in NIAL, APL2, J, SETL and FP [Back78] . Although data-parallel programs are conveniently specified in these languages, they can only be executed sequentially due to the complex and fine-grain synchronization requirements in a parallel implementation of general data-parallelism. Thus these languages are not parallel programming languages. Languages in which data-parallelism is the mechanism used to specify actual parallel computation such as *Lisp, MPL, and DAP-Fortran have historically targeted specific SIMD parallel computers. More recent languages like CMFortran and C* are portable across various SIMD and MIMD machines. The aggregates in these languages are restricted to flat arrays distributed in a regular manner over processors in an effort to predict and minimize communication requirements in execution [KLS+90, Prin90] . Because aggregates are flat, only a limited class of arithmetic and logical operations may be applied in a data-parallel fashion. Consequently, using these languages, it is not possible to directly express nested parallelism -the data-parallel application of a function which is itself data-parallel. For example, a dataparallel sort function can not be applied in parallel to every sequence in a collection of sequences. Yet this is the key step in any parallel divide-and-conquer sorting algorithm. Indeed, there is extensive evidence that nested data-parallelism is an important component in the compact expression of efficient parallel computations [Blel90, Skil90, MNP+91]. The difficulty is not in the languages, since general data-parallel languages can easily express nested parallel computations. Rather the problem lies in the difficulty of translating nested parallelism to achieve fully-parallel execution. A major step in this direction was developed in [Blel90] where it was shown that for nested sequence aggregates subject to a restricted set of operations, an equivalent vector model program operating on partitioned (segmented) flat sequences can be derived. The vector model is efficiently executed on a wide class of parallel machines. Building on these techniques, the transformations presented in this paper give a simple mechanism to transform the fully general data-parallelism available in Proteus programs into the vector model. Related work Many researchers have addressed the problem of deriving parallel programs by transformation. In this paper, we are concerned specifically with the translation of dataparallelism, so we restrict our review of related work to that concerned with the implementation of nested parallelism. CM Lisp [SH86] and Paralation Lisp [Sabo88] are fully-general data-parallel languages implemented as high-level programming languages for the Connection Machine. However, implementations of these languages apply nested data-parallel operations in a serial fashion. McCrosky [McCr87] describes a way to represent the nested arrays of APL and gives implementations for APL primitives on a SIMD execution model, but nested parallel execution is also not addressed. Philippsen [PTH91] describes an implementation of nested parallelism in Modula-2* for a SIMD computer, but Modula-2* has no data-parallel nested aggregates. So while nested parallel operations may be applied, the programmer must
doi:10.1145/155332.155345 dblp:conf/ppopp/PrinsP93 fatcat:gnunajtjcnahfhddeacecsk7ma