Coarse Grain Parallelization of H.264 Video Decoder and Memory Bottleneck in Multi-Core Architectures

Ahmet Gürhanlı, Charlie Chung-Ping Chen, Shih-Hao Hung
2011 Journal of clean energy technologies  
Fine grain methods for parallelization of the H.264 decoder have good latency performance and less memory usage. However, they could not reach the scalability of coarse grain approaches although assuming a well-designed entropy decoder which can feed the increasing number of parallel working cores. We would like to introduce a GOP (Group of Pictures) level approach due to its high scalability, mentioning solution approaches for the well-known memory issues. Our design revokes the need to a
more » ... er for GOP start-codes which was used in the earlier methods. This approach lets all the cores work on the decoding task. Our experiments showed that the memory initialization operations may degrade the scalability of parallel applications substantially. The multi-core cache architecture appeared to be a critical point for getting the desired speedup. We observed a speedup of 7.63 with 8 processors having separate caches, and a speedup of 13.35 using 16 processors when a cache is shared by 2 processors. Index Terms-video compression, H.264 decoder, parallel processing, high-performance computing, image processing.
doi:10.7763/ijcte.2011.v3.335 fatcat:jafipjrvv5eutd6mef3bqm4oky