CIRCA-GPUs: Increasing Instruction Reuse Through Inexact Computing in GP-GPUs

Abbas Rahimi, Luca Benini, Rajesh K. Gupta
2016 IEEE design & test  
CMOS) transistor scaling no longer provides uncompromised performance and power gains for integrated computing platforms [1] . Solutions that improve energy efficiencyVperformance per wattVwhile retaining as much generality as possible, are highly desirable. Modern applications including graphics, multimedia, web search, and data analytics offer massive parallelism and significant degrees of tolerance to inexact computing. A general-purpose programmable parallel architecture, such as those
more » ... such as those found in the general-purpose graphics processing units (GP-GPUs), can jointly exploit these two key application characteristics to improve energy efficiency. Inexact computing, or approximate computing, exploits application tolerance to imprecision and trades small losses in output quality for improving performance and energy [2], [3] . These error-tolerant applications exhibit enhanced error resilience at the application level when multiple valid output values are permitted, in effect, creating a relation from input values to (multiple) output values. Lack of precision in computing results, to some extent, can be tolerated as acceptable from the end application point of view. Besides the opportunity for inexact computing of these applications, their parallelism exposes inherent value similarity and locality inside a parallelized program [4]-[6]. This exposed property avoids redundant executions by reusing the result of a similar instruction rather than executing the actual instruction. Instruction reuse comes from the observation that many instructions can be skipped if another instance has already been executed using the same input values [7] . The instruction reuse recalls the outcome of an instruction on a hardware table; therefore, a processor can reuse it temporally if the processor performs the same instruction with the same input values. The combined effort of inexact computing and instruction reuse can yield significant energyefficiency gains since many of the applications that can benefit from parallelism are amenable to approximation. However, there is a lack of techniques Editor's notes: The authors introduce a method that exploits fine-grained parallelism and approximate computing in GP-GPU architecture to increase the energy efficiency through spatial and temporal reuse of instructions. -Jörg Henkel, Karlsruhe Institute of Technology
doi:10.1109/mdat.2015.2497334 fatcat:z456aqc2sffm5nvud2fsabrrti