Efficient execution of augmented reality applications on mobile programmable accelerators
2013 International Conference on Field-Programmable Technology (FPT)
Mobile devices are ubiquitous in daily lives. From smartphones to tablets, customers are constantly demanding richer user experiences through more visual and interactive interface with prolonged battery life. To meet the demands, accelerators are commonly adopted in system-on-chip (SoC) for various applications. Coarse-grained reconfigurable architecture (CGRA) is a promising solution, which accelerates hot loops with software pipelining. Although CGRAs have shown that they can support
... an support multimedia applications efficiently, more interactive applications such as augmented reality put much more pressure on performance and energy requirements. In this paper, we extend heterogeneous CGRA to provide SIMD capabilities, which improves performance and energy efficiency significantly for augmented reality applications. We show that if we can exploit data level parallelism (DLP), it is more beneficial to run on SIMD natively than to transform it into instruction level parallelism (ILP) and run on CGRA. To utilize this property, multiple processing elements in CGRA are grouped to form homogeneous SIMD cores. To reduce the hardware overhead of fetching and replicating configuration in SIMD mode, we propose a ring network and a recycle buffer to pass the configuration around as well as to temporarily store it, which has minimized impact on throughput. Also, we modify memory access units and memory banks to support split memory transactions with forwarding for handling SIMD data access. To adapt to the proposed extension, we introduce a compile technique for SIMD mode code generation to maximize the resource utilization of each SIMD core. Experimental results show that it is possible to achieve an average of 17.6% performance improvement while saving 16.9% energy over heterogeneous CGRA.