Highly parallel HEVC decoding for heterogeneous systems with CPU and GPU

Biao Wang, Diego Felix de Souza, Mauricio Alvarez-Mesa, Chi Ching Chi, Ben Juurlink, Aleksandar Ilić, Nuno Roma, Leonel Sousa
2018 Signal processing. Image communication  
The High Efficiency Video Coding HEVC standard provides a higher compression efficiency than other video coding standards but at the cost of an increased computational load, which makes hard to achieve real-time encoding/decoding for ultra high-resolution and high-quality video sequences. Graphics Processing Units GPU are known to provide massive processing capability for highly parallel and regular computing kernels, but not all HEVC decoding procedures are suited for GPU execution.
more » ... , if HEVC decoding is accelerated by GPUs, energy efficiency is another concern for heterogeneous CPU+GPU decoding. In this paper, a highly parallel HEVC decoder for heterogeneous CPU+GPU system is proposed. It exploits available parallelism in HEVC decoding on the CPU, GPU, and between the CPU and GPU devices simultaneously. On top of that, different workload balancing schemes can be selected according to the devoted CPU and GPU computing resources. Furthermore, an energy optimized solution is proposed by tuning GPU clock rates. Results show that the proposed decoder achieves better performance than the state-of-the-art CPU decoder, and the best performance among the workload balancing schemes depends on the available CPU and GPU computing resources. In particular, with an NVIDIA Titan X Maxwell GPU and an Intel Xeon E5-2699v3 CPU, the proposed decoder delivers 167 frames per second (fps) for Ultra HD 4K videos, when four CPU cores are used. Compared to the state-of-the-art CPU decoder using four CPU cores, the proposed decoder gains a speedup factor of 2.2×. When decoding performance is bounded by the CPU, a system wise energy reduction up to 36% is achieved by using fixed (and lower) GPU clocks, compared to the default dynamic clock settings on the GPU. (MMSP), Montreal, QC, 2016, pp. 1-6. The paper is extended with (i) additional workload balancing scheme (ii) integrated energy measurement module for CPU and GPU devices. (iii) energy optimized decoding for heterogeneous system by setting the GPU at fixed clock rates. (B. Wang). instructions and advanced multi-threading is able to decode 4K UHD video on contemporary desktop CPUs. In addition to CPUs, modern computer systems often include Graphics Processing Units (GPUs), resulting into a class of heterogeneous architectures. Such heterogeneous CPU+GPU systems can potentially provide the computing capability needed for the next generation of UHD HEVC decoding. In order to extract the maximum performance, HEVC decoding has to be mapped appropriately onto such heterogeneous architectures. First, the decoding sub-modules need to be distributed properly between the CPU and GPU according to their computing characteristics. Second, the assigned decoding tasks on both the CPU and GPU sides have to be parallelized and optimized. Besides, the decoding operations between the CPU and GPU requires efficient communication and pipeline consideration. Finally, multiple load balancing schemes are desired when the available computing resource changes on the CPU and GPU devices.
doi:10.1016/j.image.2017.12.009 fatcat:sn64fkphk5fsflhttlvgykt2zq