Feasibility study of MPI implementation on the heterogeneous multi-core cell BE™ architecture

Arun Kumar, Naresh Jayam, Ashok Srinivasan, Ganapathy Senthilkumar, Pallav K. Baruah, Shakti Kapoor, Murali Krishna, Raghunath Sarma
2007 Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures - SPAA '07  
The Cell Broadband Engine TM is a new heterogeneous multi-core processor from IBM, Sony, and Toshiba. It contains eight coprocessors, called Synergistic Processing Elements (SPEs), which operate directly on distinct 256 KB local stores, and also have access to a shared 512 MB to 2 GB main memory. The combined peak speed of the SPEs is 204.8 Gflop/s in single precision and 14.64 Gflop/s in double precision. There is, therefore, much interest in using the Cell BE TM for high performance computing
more » ... applications. However, the unconventional architecture of the SPEs, in particular their local stores, creates some programming challenges. We describe our implementation of certain core features of MPI, such as blocking point-to-point calls and collective communication calls, which can help meet these challenges, by enabling a large class of MPI applications to be ported to the Cell BE TM processor. This implementation views each SPE as a node for an MPI process. We store the application data in main memory in order to avoid being limited by the local store size. The local store is abstracted in the library and thus hidden from the application with respect to MPI calls. We have achieved bandwidth up to 6.01 GB/s and latency as low as 0.41 μs on the ping-pong test. The contribution of this work lies in (i) demonstrating that the Cell BE TM has good potential for running intra-Cell BE TM MPI applications, (ii) enabling such applications to be ported to the Cell BE TM with minimal effort, and (iii) evaluating the performance impact of different design choices.
doi:10.1145/1248377.1248387 dblp:conf/spaa/KumarJSSBKKS07 fatcat:7wpuwyxqovexdenrp3n57mcnhy