A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs [article]

Lingqi Zhang, Mohamed Wahib, Haoyu Zhang, Satoshi Matsuoka
2020 arXiv   pre-print
GPUs are playing an increasingly important role in general-purpose computing. Many algorithms require synchronizations at different levels of granularity in a single GPU. Additionally, the emergence of dense GPU nodes also calls for multi-GPU synchronization. Nvidia's latest CUDA provides a variety of synchronization methods. Until now, there is no full understanding of the characteristics of those synchronization methods. This work explores important undocumented features and provides an
more » ... th analysis of the performance considerations and pitfalls of the state-of-art synchronization methods for Nvidia GPUs. The provided analysis would be useful when making design choices for applications, libraries, and frameworks running on single and/or multi-GPU environments. We provide a case study of the commonly used reduction operator to illustrate how the knowledge gained in our analysis can be useful. We also describe our micro-benchmarks and measurement methods.
arXiv:2004.05371v1 fatcat:la43kfqazzca7oz2nyv6sqvzee