Power modeling and architectural techniques for energy-efficient GPUs [article]

Sohan Lal, Technische Universität Berlin, Technische Universität Berlin, Ben Juurlink
2019
Graphics Processing Units (GPUs) have evolved from fixed function graphics processors to programmable general-purpose compute accelerators in a short time. The high theoretical performance and energy efficiency of GPUs compared to CPUs have made them indispensable for mainstream computing. However, their high power consumption and limited energy efficiency under low utilization is a challenge that still needs to be tackled. This thesis investigates bottlenecks that cause low performance and low
more » ... energy efficiency in GPUs and proposes architectural techniques to address them. To conduct energy efficiency research for GPUs, we first develop a flexible and accurate power simulator called GPUSimPow. We use a hybrid approach for power modeling that improves flexibility and accuracy compared to previous approaches. Our evaluation shows an average relative error of 11.7% and 10.8% between simulated and measured power for the NVIDIA GT240 and GTX580, respectively. We then use GPUSimPow to study the energy efficiency of a wide range of kernels and categorize them into high performance and low performance. We further investigate the bottlenecks of low-performance kernels by analyzing their occupancy. We quantify the gain in performance and energy efficiency when occupancy is increased. For instance, the average increase in instructions per cycle, the average reduction in energy consumption and energy-delay-product is 11%, 9%, and 23%, respectively, when occupancy is increased for a sub-category of low occupancy kernels. The full occupancy kernels have low performance despite having the maximum number of threads. Further investigation shows that several of these kernels are memory-bound and can gain significantly from an increase in memory bandwidth. The traditional ways of increasing memory bandwidth by widening interfaces and increasing frequency have issues such as high power consumption, large form factor, and difficulty in the scaling of pin count. Memory compression is a promising alternative to increase the effective [...]
doi:10.14279/depositonce-9156 fatcat:4r6temrq2nhfnnsf55w43qhy54