Analysis, fast algorithm, and VLSI architecture design for H.264/AVC intra frame coder

Yu-Wen Huang, Bing-Yu Hsieh, Tung-Chien Chen, Liang-Gee Chen
2005 IEEE transactions on circuits and systems for video technology (Print)  
Intra prediction with rate-distortion constrained mode decision is the most important technology in H.264/AVC intra frame coder, which is competitive with the latest image coding standard JPEG2000, in terms of both coding performance and computational complexity. The predictor generation engine for intra prediction and the transform engine for mode decision are critical because the operations require a lot of memory access and occupy 80% of the computation time of the entire intra compression
more » ... ocess. A low cost general purpose processor cannot process these operations in real time. In this paper, we proposed two solutions for platform-based design of H.264/AVC intra frame coder. One solution is a software implementation targeted at low-end applications. Context-based decimation of unlikely candidates, subsampling of matching operations, bit-width truncation to reduce the computations, and interleaved full-search/partial-search strategy to stop the error propagation and to maintain the image quality, are proposed and combined as our fast algorithm. Experimental results show that our method can reduce 60% of the computation used for intra prediction and mode decision while keeping the peak signal-to-noise ratio degradation less than 0.3 dB. The other solution is a hardware accelerator targeted at high-end applications. After comprehensive analysis of instructions and exploration of parallelism, we proposed our system architecture with four-parallel intra prediction and mode decision to enhance the processing capability. Hadamard-based mode decision is modified as discrete cosine transform-based version to reduce 40% of memory access. Two-stage macroblock pipelining is also proposed to double the processing speed and hardware utilization. The other features of our design are reconfigurable predictor generator supporting all of the 13 intra prediction modes, parallel multitransform and inverse transform engine, and CAVLC bitstream engine. A prototype chip is fabricated with TSMC 0.25-m CMOS 1P5M technology. Simulation results show that our implementation can process 16 mega-pixels ( 4096 4096) within 1 s, or namely 720 480 4:2:0 30 Hz video in real time, at the operating frequency of 54 MHz. The transistor count is 429 K, and the core size is only 1.855 1.885 mm 2 . Index Terms-Intra frame coder, ISO/IEC 14496-10 AVC, ITU-T Rec. H.264, Joint Video Team (JVT), VLSI architecture.
doi:10.1109/tcsvt.2004.842620 fatcat:fc3mcqqjz5huvmu7b5qoevwm7u