Forward Progress on GPU Concurrency

Alastair Donaldson, Jeroen Ketema, Tyler Sorensen, John Wickerson, Alastair Donaldson, Jeroen Ketema, Tyler Sorensen, John Wickerson
2017 13 Leibniz International Proceedings in Informatics Schloss Dagstuhl-Leibniz-Zentrum für Informatik   unpublished
The tutorial at CONCUR will provide a practical overview of work undertaken over the last six years in the Multicore Programming Group at Imperial College London, and with collaborators internationally, related to understanding and reasoning about concurrency in software designed for acceleration on GPUs. In this article we provide an overview of this work, which includes contributions to data race analysis, compiler testing, memory model understanding and formal-isation, and most recently
more » ... ts to enable portable GPU implementations of algorithms that require forward progress guarantees. 1998 ACM Subject Classification D.1.3 Concurrent Programming 1 Introduction Graphics processing units (GPUs) offer a large degree of parallelism at a relatively low cost, and are now routinely applied to the acceleration of a wide variety of computational tasks that go well beyond the domain of graphics (see e.g. [39] for details of many application areas and developments). It is invariably harder to design a software application that takes advantage of GPU parallelism than it is to write a sequential version of the application that runs only on the CPU. Furthermore, parallel programming for GPUs is in many ways more complicated than parallel programming for multi-core CPUs. This is because achieving high performance requires working in low level languages, such as CUDA [33] and OpenCL [24], which provide close-to-the-metal language features to enable mapping an algorithm to the architectural capabilities of a device. Numerous high level programming models have been proposed to ease the burden of GPU programming, via automatic generation of low level code, but as yet are not widely adopted. Low level programming of GPUs is made challenging by traditional concurrency bugs such as data races, by issues related to relaxed memory, and by the constrained hardware execution model on which software executes. Furthermore, reliable production compilers for
fatcat:dh6q7k6l2va7libb44u3cas5je