A unified optimizing compiler framework for different GPGPU architectures

Yi Yang, Ping Xiang, Jingfei Kong, Mike Mantor, Huiyang Zhou
2012 ACM Transactions on Architecture and Code Optimization (TACO)  
This paper presents a novel optimizing compiler for general purpose computation on graphics processing units (GPGPU). It addresses two major challenges of developing high performance GPGPU programs: effective utilization of GPU memory hierarchy and judicious management of parallelism. The input to our compiler is a naïve GPU kernel function, which is functionally correct but without any consideration for performance optimization. The compiler generates two kernels, one optimized for global
more » ... ies and the other for texture memories. The proposed compilation process is effective for both AMD/ATI and NVIDIA GPUs. The experiments show that our optimized code achieves very high performance, either superior or very close to highly fine-tuned libraries. al. 2010]. The new material in this article includes: (1) we extend the compiler to support OpenCL, which is used by both NIVIDIA and AMD GPUs; (2) three types of vectorization on memory accesses are included in the compiler; (3) novel code optimizations to utilize texture memory are incorporated into the compiler; (4) detailed performance results for texture memory are presented; (5) we evaluate our benchmarks on the latest NVIDIA GTX 480 and AMD/ATI HD 5870 GPUs.
doi:10.1145/2207222.2207225 fatcat:yx6p2hyun5cd3bstd76xp7xwom