A UPC++ Actor Library and Its Evaluation On a Shallow Water Proxy Application

Alexander Pppl, Scott Baden, Michael Bader
2019 2019 IEEE/ACM Parallel Applications Workshop, Alternatives To MPI (PAW-ATM)  
offer their expected performance. Currently, many applications still follow the Bulk Synchronous Parallel (BSP) model, with clearly defined phases for computation, communication and synchronization. The most widely used approach here is to use MPI for inter-node communication and parallelization, and OpenMP for the on-node parallelization. The BSP approach enables a clear separation of concerns, but the structure, especially with the synchronization step at the end may be too rigid to obtain
more » ... best performance. As the number of nodes increases, so will the difficulty of maintaining the pure BSP model, and therefore the burden to the application programmer. A promising model is the Partitioned Global Address Space (PGAS) programming model [2] . This model assumes a global address space, but exposes the separate physical address domains. This may ease the burden on the application programmers, as they no longer need to think about in terms of message-passing, but can access data on remote ranks directly. Another promising model is the task-based programming model [3] . Here, the programmer specifies pieces of computation and communication as tasks, and also their dependencies. Afterwards, the resulting task graph is handed to a scheduling system that schedules them onto available computing resources. This model has been implemented in OpenMP [4] and also in runtime systems, for example in StarPU, which enables distributed task scheduling onto heterogeneous machines [5] , or the AllScale project [6] , which aims to separate the specification of parallelism from its low-level management on the target hardware. Task-based parallelism has been employed successfully in complex applications, for example in the Uintah application framework [7] . In the Invasive Computing project 1 , we investigate novel approaches to use future, parallel and heterogeneous computers [8] . Most of the research is focused around the project's own hardware architecture, a cache-incoherent heterogeneous Multiprocessor System-on-Chip (MPSoC). This architecture features multiple smaller groups of CPU cores (called tiles) that share a cache hierarchy and a memory. The different tiles are connected using a Network-on-Chip. There are different types of tiles, such as tiles containing normal CPU
doi:10.1109/paw-atm49560.2019.00007 dblp:conf/sc/PopplBB19 fatcat:ecudfm2vvvavfbkxqd7um4ajoe