Bottlenecks in multimedia processing with SIMD style extensions and architectural enhancements
IEEE transactions on computers
Multimedia SIMD extensions such as MMX and AltiVec speedup media processing, however, our characterization shows that the attributes of current general-purpose processors enhanced with SIMD extensions do not match very well with the access patterns and loop structures of media programs. We find that 75-85% of the dynamic instructions in the processor instruction stream are supporting instructions necessary to feed the SIMD execution units rather than true/useful computations, resulting in the
... derutilization of SIMD execution units (only 1-12% of the peak SIMD execution units' throughput is achieved). Contrary to focusing on exploiting more data level parallelism (DLP), in this paper, we focus on the instructions that support the SIMD computations and exploit both fine-and coarsegrained instruction level parallelism (ILP) in the supporting instruction stream. We propose the MediaBreeze architecture that uses hardware support for efficient address generation, looping and data reorganization (permute, packing/unpacking, transpose, etc). Our results on multimedia kernels show that a 2-way processor with SIMD extensions enhanced with MediaBreeze provides a better performance than a 16-way processor with current SIMD extensions. In the case of application benchmarks, a 2-/4-way processor with SIMD extensions augmented with MediaBreeze outperforms a 4-/8-way processor with SIMD extensions. A first-order approximation using ASIC synthesis tools and cell-based libraries shows that this acceleration is achieved at a 10% increase in area required by MMX and SSE extensions (0.3% increase in overall chip area) and 1% of total processor power consumption.