Design and Evaluation of Confidence-Driven Error-Resilient Systems

Chia-Hsiang Chen, David Blaauw, Dennis Sylvester, Zhengya Zhang
2014 IEEE Transactions on Very Large Scale Integration (vlsi) Systems  
Deeply scaled CMOS circuits are increasingly susceptible to transient faults and soft errors; emerging post-CMOS devices can be more vulnerable, sometimes exhibiting erratic errors of arbitrary duration. Applying timing and supply voltage margin is wasteful and becoming ineffective, and conventional checking and sparing techniques provide only a limited error coverage against widely varying errors. We propose a confidencedriven computing (CDC) model for an adaptive protection against
more » ... istic errors. The CDC model employs finegrained temporal redundancy and confidence checking for a faster adaptation and tunable reliability. The CDC model can be extended to deeply scaled CMOS circuits that are mainly affected by transient faults and soft errors, where an early checking (EC) technique can be used to perform independent error checking for more flexibility and better performance. To evaluate the CDC model, we apply a sample-based fieldprogrammable gate array emulation along with real-time error injection. The CDC model is shown to adapt to fluctuating error rates and enhance the system reliability by effectively trading off performance. To evaluate the EC technique at a finer time scale, we create a new event-based simulation to capture path delay distribution, error model, and their interactions. The EC technique improves the system reliability by more than four orders of magnitude when errors are of short duration. Both the CDC model and the EC technique are synthesized in a 45-nm CMOS technology for cost estimates: 1) the area overhead is as low as 12% and 2) energy overhead can be limited to 19%. Index Terms-Error detection, error simulation, field-programmable gate array (FPGA) emulation, reliability, resilient design.
doi:10.1109/tvlsi.2013.2277351 fatcat:ni37vig4i5eozcd4kbyu7ymcxu