Performance Characteristics of Virtualized GPUs for Deep Learning

Scott Michael, Scott Teige, Junjie Li, John Michael Lowe, George Turner, Robert Henschel
2020 2020 IEEE/ACM International Workshop on Interoperability of Supercomputing and Cloud Technologies (SuperCompCloud)  
As deep learning techniques and algorithms become more and more common in scientific workflows, HPC centers are grappling with how best to provide GPU resources and support deep learning workloads. One novel method of deployment is to virtualize GPU resources allowing for multiple VM instances to have logically distinct virtual GPUs (vGPUs) on a shared physical GPU. However, there are many operational and performance implications to consider before deploying a vGPU service in an HPC center. In
more » ... his paper, we investigate the performance characteristics of vGPUs for both traditional HPC workloads and for deep learning training and inference workloads. Using NVIDIA's vDWS virtualization software, we perform a series of HPC and deep learning benchmarks on both non-virtualized (bare metal) and vGPUs of various sizes and configurations. We report on several of the challenges we discovered in deploying and operating a variety of virtualized instance sizes and configurations. We find that the overhead of virtualization on HPC workloads is generally < 10%, and can vary considerably for deep learning, depending on the task.
doi:10.1109/supercompcloud51944.2020.00008 fatcat:ndq5hcwczfb5rn7hvza27764l4