Examining ACE analysis reliability estimates using fault-injection

Nicholas J. Wang, Aqeel Mahesri, Sanjay J. Patel
2007 Proceedings of the 34th annual international symposium on Computer architecture - ISCA '07  
ACE analysis is a technique to provide an early reliability estimate for microprocessors. ACE analysis couples data from abstract performance models with low level design details to identify and rule out transient faults that will not cause incorrect execution. While many transient faults are analyzable in ACE analysis frameworks, some are not. As a result, ACE analysis is conservative and provides a lower bound for the reliability of a processor design. Bounding the reliability of a design is
more » ... seful since it can guarantee that the given design will meet reliability goals. In this work, we quantify and identify the sources of ACE analysis conservatism by comparing an ACE analysis methodology against a rigorous fault-injection study. We evaluate two flavors of ACE analysis: a "simple" analysis and a refined analysis, finding that even the refined analysis overestimates the soft error vulnerability of an instruction scheduler by 2-3x. The conservatism stems from two key sources: from lack of detail in abstract performance models and from what we term Y-Bits, a result of the single-pass simulation methodology that is typical of ACE analysis. We also examine the efficacy of applying ACE analysis to a class of "partial coverage" error mitigation techniques. In particular, we perform a case study on one such technique and extrapolate our findings to others.
doi:10.1145/1250662.1250719 dblp:conf/isca/WangMP07 fatcat:5adwbd5jebd5lk2h5rhpoyakgy