Heterogeneous-race-free memory models

Derek R. Hower, Blake A. Hechtman, Bradford M. Beckmann, Benedict R. Gaster, Mark D. Hill, Steven K. Reinhardt, David A. Wood
2014 Proceedings of the 19th international conference on Architectural support for programming languages and operating systems - ASPLOS '14  
Commodity heterogeneous systems (e.g., integrated CPUs and GPUs), now support a unified, shared memory address space for all components. Because the latency of global communication in a heterogeneous system can be prohibitively high, heterogeneous systems (unlike homogeneous CPU systems) provide synchronization mechanisms that only guarantee ordering among a subset of threads, which we call a scope. Unfortunately, the consequences and semantics of these scoped operations are not yet well
more » ... ood. Without a formal and approachable model to reason about the behavior of these operations, we risk an array of portability and performance issues. In this paper, we embrace scoped synchronization with a new class of memory consistency models that add scoped synchronization to data-race-free models like those of C++ and Java. Called sequential consistency for heterogeneousrace-free (SC for HRF), the new models guarantee SC for programs with "sufficient" synchronization (no data races) of "sufficient" scope. We discuss two such models. The first, HRF-direct, works well for programs with highly regular parallelism. The second, HRF-indirect, builds on HRFdirect by allowing synchronization using different scopes in some cases involving transitive communication. We quantitatively show that HRF-indirect encourages forward-looking programs with irregular parallelism by showing up to a 10% performance increase in a task runtime for GPUs. 1 Sub-groups are optional in OpenCL, but will usually be defined on an SIMT GPU, and correspond to vector units. 2 We use OpenCL terminology in this paper. In Section 6 we discuss the CUDA equivalents. 3 As in CPU shared memory, not the CUDA "shared memory" scratchpad. Figure 1. The OpenCL execution hierarchy. Grid Work-group Work-item Sub-group (Hardware-specific size) Dimension X D im e n s io n Z Dimension Y Dimension X D im e n s io n Z Dimension Y
doi:10.1145/2541940.2541981 dblp:conf/asplos/HowerHBGHRW14 fatcat:iehbe3fbrff33erb6qfxclgi5i