PGAS-FMM: Implementing a distributed fast multipole method using the X10 programming language
Concurrency and Computation
The fast multipole method (FMM) is a complex, multi-stage algorithm over a distributed tree data structure, with multiple levels of parallelism and inherent data locality. X10 is a modern partitioned global address space language with support for asynchronous activities. The parallel tasks comprising FMM may be expressed in X10 using a scalable pattern of activities. This paper demonstrates the use of X10 to implement FMM for simulation of electrostatic interactions between ions in a cyclotron
... esonance mass spectrometer. X10's task-parallel model is used to express parallelism using a pattern of activities mapping directly onto the tree. X10's work stealing runtime handles load balancing fine-grained parallel activities, avoiding the need for explicit work sharing. The use of global references and active messages to create and synchronize parallel activities over a distributed tree structure is also demonstrated. In contrast to previous simulations of ion trajectories in cyclotron resonance mass spectrometers, our code enables both simulation of realistic particle numbers and guaranteed error bounds. Single-node performance is comparable with the fastest published FMM implementations, and critical expansion operators are faster for high accuracy calculations. A comparison of parallel and sequential codes shows the overhead of activity management and work stealing in this application is low. Scalability is evaluated for 8k cores on a Blue Gene/Q system and 512 cores on a Nehalem/InfiniBand cluster.