A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is
Each new generation of GPUs vastly increases the resources available to GPGPU programs. GPU programming models (like CUDA) were designed to scale to use these resources. However, we find that CUDA programs actually do not scale to utilize all available resources, with over 30% of resources going unused on average for programs of the Parboil2 suite that we used in our work. Current GPUs therefore allow concurrent execution of kernels to improve utilization. In this work, we study concurrentdoi:10.1145/2499368.2451160 fatcat:ljkrkgicvnavbeyugydbpqhpf4