A Comparative Analysis of Parallel Programming Models for C++
The parallel programming model used in a software development project can significantly affect the way concurrency is expressed in the code. It also comes with certain trade-offs with regard to ease of development, code readability, functionality, run-time overheads, and scalability. Especially the performance aspects can vary with the type of parallelism suited to the problem at hand. We investigate how well three popular multi-tasking frameworks for C++-Threading Building Blocks, Cilk Plus,
... d OpenMP 4.0-cope with three of the most common parallel scenarios: recursive divide-and-conquer algorithms; embarrassingly parallel loops; and loops that update shared variables. We implement merge sort, matrix multiplication, and dot product as test cases for the respective scenario in each of the programming models. We then go one step further and also apply the vectorisation support offered by Cilk Plus and OpenMP 4.0 to the data-parallel aspects of the loop-based algorithms. Our results demonstrate that certain configurations expose significant differences in the task creation and scheduling overheads among the tested frameworks. We also highlight the importance of testing how well an algorithm scales with the number of hardware threads available to the application.