Estimating Silent Data Corruption Rates Using a Two-Level Model [article]

Siva Kumar Sastry Hari, Paolo Rech, Timothy Tsai, Mark Stephenson, Arslan Zulfiqar, Michael Sullivan, Philip Shirvani, Paul Racunas, Joel Emer, Stephen W. Keckler
<span title="2020-04-28">2020</span> <i > arXiv </i> &nbsp; <span class="release-stage" >pre-print</span>
High-performance and safety-critical system architects must accurately evaluate the application-level silent data corruption (SDC) rates of processors to soft errors. Such an evaluation requires error propagation all the way from particle strikes on low-level state up to the program output. Existing approaches that rely on low-level simulations with fault injection cannot evaluate full applications because of their slow speeds, while application-level accelerated fault testing in accelerated
more &raquo; ... ticle beams is often impractical. We present a new two-level methodology for application resilience evaluation that overcomes these challenges. The proposed approach decomposes application failure rate estimation into (1) identifying how particle strikes in low-level unprotected state manifest at the architecture-level, and (2) measuring how such architecture-level manifestations propagate to the program output. We demonstrate the effectiveness of this approach on GPU architectures. We also show that using just one of the two steps can overestimate SDC rates and produce different trends---the composition of the two is needed for accurate reliability modeling.
<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.01445v1">arXiv:2005.01445v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/f4fd2rmii5bj3lpm5fszqsrg7a">fatcat:f4fd2rmii5bj3lpm5fszqsrg7a</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200907140250/https://arxiv.org/pdf/2005.01445v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/23/79/237962984c60fbaf634f0511b499508b024496eb.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/2005.01445v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>