MPI microtask for programming the Cell Broadband Engine™ processor

M. Ohara, H. Inoue, Y. Sohda, H. Komatsu, T. Nakatani
2006 IBM Systems Journal  
The Cell Broadband Enginee processor employs multiple accelerators, called synergistic processing elements (SPEs), for high performance. Each SPE has a highspeed local store attached to the main memory through direct memory access (DMA), but a drawback of this design is that the local store is not large enough for the entire application code or data. It must be decomposed into pieces small enough to fit into local memory, and they must be replaced through the DMA without losing the performance
more » ... ain of multiple SPEs. We propose a new programming model, MPI microtask, based on the standard Message Passing Interface (MPI) programming model for distributed-memory parallel machines. In our new model, programmers do not need to manage the local store as long as they partition their application into a collection of small microtasks that fit into the local store. Furthermore, the microtasks by exploiting explicit communications in the MPI model. We have created a prototype that includes a novel static scheduler for such optimizations. Our initial experiments have shown some encouraging results. Ó 1 preprocessor and runtime in our microtask system optimize the execution of 9 they are not directly applicable to the Cell BE processor. This is because of key differences in the architectural characteristics; that is, existing algorithms assume loosely coupled coarse-grain multiprocessors, where each processor has a large local memory but the communication latency between processors is very large. The Cell BE processor, on the other hand, is a tightly coupled fine-grain multicore processor where each SPE has a small local memory but the communication latency between SPEs is very small. These differences have led us to a new clustering approach in our static scheduling algorithm. The contribution of this paper is twofold. First, we propose a microtask model for the Cell BE processor. It frees programmers from explicit local-store management, which could be a significant burden for them. Second, we propose a novel scheduling algorithm that converts a microtask program into one for a streaming model which the Cell BE processor can execute efficiently. RELATED WORK The microtask model is compared with other programming models proposed for the Cell BE processor and similar architectures, and related work in static scheduling algorithms is discussed. PPE-centric versus SPE-centric programming models Kahle et al. 2 proposed two approaches to map application programs to the Cell BE processor: function offload and computational acceleration models. The function offload model is a PPE-centric OHARA ET AL.
doi:10.1147/sj.451.0085 fatcat:6jekxrtumfb6ldz6g6hkh54lze