Boosting mobile GPU performance with a decoupled access/execute fragment processor

Jose-Maria Arnau, Joan-Manuel Parcerisa, Polychronis Xekalakis
2012 2012 39th Annual International Symposium on Computer Architecture (ISCA)  
Smartphones represent one of the fastest growing markets, providing significant hardware/software improvements every few months. However, supporting these capabilities reduces the operating time per battery charge. The CPU/GPU component is only left with a shrinking fraction of the power budget, since most of the energy is consumed by the screen and the antenna. In this paper, we focus on improving the energy efficiency of the GPU since graphical applications consist an important part of the
more » ... sting market. Moreover, the trend towards better screens will inevitably lead to a higher demand for improved graphics rendering. We show that the main bottleneck for these applications is the texture cache and that traditional techniques for hiding memory latency (prefetching, multithreading) do not work well or come at a high energy cost. We thus propose the migration of GPU designs towards the decoupled access-execute concept. Furthermore, we significantly reduce bandwidth usage in the decoupled architecture by exploiting inter-core data sharing. Using commercial Android applications, we show that the end design can achieve 93% of the performance of a heavily multithreaded GPU while providing energy savings of 34%. 1. For a Tegra like system, we found that if no threading is utilized to hide the latency, the performance hit for a set of 3D Games for the Android is 140%. A Tegra with perfect caches can provide up to 285% over the non-threaded version.
doi:10.1109/isca.2012.6237008 dblp:conf/isca/ArnauPX12 fatcat:wu7qgj5rqnh2rhftcvo5xdenge