13,975 Hits in 9.3 sec

Estimating Silent Data Corruption Rates Using a Two-Level Model [article]

Siva Kumar Sastry Hari, Paolo Rech, Timothy Tsai, Mark Stephenson, Arslan Zulfiqar, Michael Sullivan, Philip Shirvani, Paul Racunas, Joel Emer, Stephen W. Keckler
2020 arXiv   pre-print
High-performance and safety-critical system architects must accurately evaluate the application-level silent data corruption (SDC) rates of processors to soft errors.  ...  We also show that using just one of the two steps can overestimate SDC rates and produce different trends---the composition of the two is needed for accurate reliability modeling.  ...  and lead to Silent Data Corruption (SDC).  ... 
arXiv:2005.01445v1 fatcat:f4fd2rmii5bj3lpm5fszqsrg7a

Evaluating the impact of Undetected Disk Errors in RAID systems

Eric W. D. Rozier, Wendy Belluomini, Veera Deenadhayalan, Jim Hafner, KK Rao, Pin Zhou
2009 2009 IEEE/IFIP International Conference on Dependable Systems & Networks  
Our implementation enables us to model arbitrary storage systems and workloads and estimate the rate of undetected data corruptions.  ...  While RAID systems have proven effective in protecting data from traditional disk failures, silent data corruption events remain a significant problem unaddressed by RAID.  ...  We can estimate the rates of UDE occurrence by using a combination of the data presented in [1] and the data presented in [2] .  ... 
doi:10.1109/dsn.2009.5270353 dblp:conf/dsn/RozierBDHRZ09 fatcat:ol6kkftzh5cxtnvay3ysdmf4ha

System-level analysis of soft error rates and mitigation trade-off explorations

Zhe Ma, Francky Catthoor, Frank Vermunt, Teun Hendriks
2010 2010 IEEE International Reliability Physics Symposium  
This paper presents a novel system-level analysis of soft error rates (SER) based on the Transaction Level Model (TLM) of a targeted System-On-a-Chip (SoC).  ...  This analysis runs 1000x faster than the conventional SoC analysis using a gatelevel model.  ...  The only difference is that for silent data corruptions, we need to calculate them for each output data object.  ... 
doi:10.1109/irps.2010.5488685 fatcat:nd6cmzvdg5hfnfz5og5j24r3hi

Cross-Layer Resilience Against Soft Errors: Key Insights [chapter]

Daniel Mueller-Gritschneder, Eric Cheng, Uzair Sharif, Veit Kleeberger, Pradip Bose, Subhasish Mitra, Ulf Schlichtmann
2020 Embedded Systems  
Such soft errors may cause malfunction of the system due to corruption of data or control flow, which may lead to unacceptable risks for life or property in safety-critical applications.  ...  Here, cross-layer resilience techniques aim at finding lower cost solutions by providing accurate estimation of soft error resilience combined with a systematic exploration of protection techniques that  ...  The blue bar shows the rate of silent data corruption caused when a faulty cache line is read.  ... 
doi:10.1007/978-3-030-52017-5_11 fatcat:sbwfuocpz5duzarrb4ysbee3cm

Understanding soft error propagation using Efficient vulnerability-driven fault injection

Xin Xu, Man-Lap Li
2012 IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)  
With CriticalFault, our results show that the injection space is reduced by 29 % and 59 % of the biased injections cause either software aborts or silent data corruptions, both are improvements from SFI  ...  To evaluate, statistical fault injection (SFI) is often used to estimate the error coverage of the underlying method.  ...  Specifically, an injected error that is not derated will result in a application-level or OS-level abort or a silent data corruption.  ... 
doi:10.1109/dsn.2012.6263923 dblp:conf/dsn/XuL12 fatcat:e7k7kwagszalxdalohl25ow6pu

The Significance of Storage in the "Cost of Risk" of Digital Preservation

Richard Wright, Ant Miller, Matthew Addis
2009 International Journal of Digital Curation  
We review the vital role of storage and show how planning for long-term preservation of data should consider the risks involved in using digital storage technology.  ...  We examine current modelling of costs and risks in digital preservation, concentrating on the Total Cost of Risk when using digital storage systems for preserving audiovisual material.  ...  If the audio is sampled at 44.1 kHz (the rate used on CDs), each sample represents about 23 micro-seconds of data.  ... 
doi:10.2218/ijdc.v4i3.125 fatcat:uuufilkkdjg3df5mghtfra32mq

Resilient N-Body Tree Computations with Algorithm-Based Focused Recovery: Model and Performance Analysis [chapter]

Aurélien Cavelan, Aiman Fang, Andrew A. Chien, Yves Robert
2017 Lecture Notes in Computer Science  
This paper presents a model and performance study for Algorithm-Based Focused Recovery (ABFR) applied to N-body computations, subject to latent errors.  ...  We make a detailed comparison with the classical Checkpoint/Restart (CR) approach.  ...  corrupted data.  ... 
doi:10.1007/978-3-319-72971-8_8 fatcat:4j3jnyaq4fdoxli7xu5fzfymum

Modeling the Fault Tolerance Consequences of Deduplication

Eric W.D. Rozier, William H. Sanders, Pin Zhou, Nagapramod Mandagere, Sandeep M. Uttamchandani, Mark L. Yakushev
2011 2011 IEEE 30th International Symposium on Reliable Distributed Systems  
We present a framework composed of data analysis methods and a model of data deduplication that is useful in studying the reliability impact of data deduplication.  ...  The framework is useful for determining a deduplication strategy that is estimated to satisfy a set of reliability constraints supplied by a user.  ...  data loss, and the impact of silent data corruptions, though the former is easily countered by using higher level RAID configurations.  ... 
doi:10.1109/srds.2011.18 dblp:conf/srds/RozierSZMUY11 fatcat:4z2nkaaxxredfgl35qjgt6cwxe

Bamboo ECC: Strong, safe, and flexible codes for reliable computer memory

Jungrae Kim, Michael Sullivan, Mattan Erez
2015 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)  
Relative to the state-of-the-art single-tier error protection, Bamboo ECC codes have superior correction capabilities, all but eliminate the risk of silent data corruption, and can also increase redundancy  ...  Growing computer system sizes and levels of integration have made memory reliability a primary concern, necessitating strong memory error protection.  ...  If there are two pin faults on two chips, QPC can correct both of them while AMD chipkill must report a DUE (or, in some cases, AMD chipkill results in silent data corruption).  ... 
doi:10.1109/hpca.2015.7056025 dblp:conf/hpca/KimSE15 fatcat:gwtcs4iibrf53pvoy76hykbbxe


Seong-Lyong Gong, Minsoo Rhu, Jungrae Kim, Jinsuk Chung, Mattan Erez
2015 Proceedings of the 48th International Symposium on Microarchitecture - MICRO-48  
We propose a novel memory protection scheme called CLEAN (Chipkill-LEvel reliable and Access granularity Negotiable), which enables us to balance the contradicting demands of fine-grained (FG) access and  ...  To close a potentially significant detection coverage gap due to CLEAN's detection mechanism coupled with permanent faults, we design a simple mechanism access granularity enforcement.  ...  silent data corruption events.  ... 
doi:10.1145/2830772.2830799 dblp:conf/micro/GongRKCE15 fatcat:mjjew46fwzgv3iuilrlfnjmkki

Addressing multiple bit/symbol errors in DRAM subsystem [article]

Ravikiran Yeleswarapu, Arun K. Somani
2020 arXiv   pre-print
Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). SSCMSD can also enhance the capability of detecting errors in address bits.  ...  Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat.  ...  The probability of false negative is estimated by using the upper bound on SDC rate for the baseline SSC-decoder (8%) and collision probability for a N-bit hash is estimated by birthday paradox (2 −N/2  ... 
arXiv:1908.01806v2 fatcat:dwti5nsgrja5dcj2ocvvqmudpm

Exploring Partial Replication to Improve Lightweight Silent Data Corruption Detection for HPC Applications [chapter]

Eduardo Berrocal, Leonardo Bautista-Gomez, Sheng Di, Zhiling Lan, Franck Cappello
2016 Lecture Notes in Computer Science  
Silent data corruption (SDC) poses a great challenge for high-performance computing (HPC) applications as we move to extremescale systems.  ...  Accurate predictions allow us to detect corruptions when data values are far "enough" from them.  ...  Introduction Silent data corruption (SDC) involves corruption to an application's memory state (including both code and data) caused by undetected soft errors, that is, errors that modify the information  ... 
doi:10.1007/978-3-319-43659-3_31 fatcat:toubdyjsq5b65lvizf5kx7me7e

Bit Preservation: A Solved Problem?

David S. H. Rosenthal
2010 International Journal of Digital Curation  
This paper is in four parts:Claims, reviewing a typical claim of storage system reliability, showing that it provides no useful information for bit preservation purposes.Theory, proposing "bit half-life  ...  For years, discussions of digital preservation have routinely featured comments such as "bit preservation is a solved problem; the real issues are ...".  ...  a significant rate of silent disk errors that would lead to silent data corruption.  ... 
doi:10.2218/ijdc.v5i1.148 fatcat:4jrjl3kqa5d37g5inrqwazcxae

Political Risk and Real Exchange Rate: What Can We Learn from Recent Developments in Panel Data Econometrics for Emerging and Developing Countries?

Mohsen Bahmani-Oskooee, Thouraya Hadj Amor, Ridha Nouira, Christophe Rault
2018 Journal of Quantitative Economics  
We use annual data from the International Country Risk Guide database over the 1984 to 2016 period.  ...  : i) countries experiencing a high degree of corruption, a high risk to investment, or a high degree of political instability tend to experience a real exchange rate depreciation, ii) there exists strong  ...  The previous literature on the estimation of long-run effects using panel data allowed for the estimation of long-run effects using panel data, but it doesn't allow for cross-sectionally dependent errors  ... 
doi:10.1007/s40953-018-0145-4 fatcat:kzt6dukjs5fmvggqumjadi3tla

Zettabyte reliability with flexible end-to-end data integrity

Yupu Zhang, Daniel S. Myers, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
2013 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST)  
Z 2 FS provides dynamical tradeoffs between performance and protection and offers Zettabyte Reliability, which is one undetected corruption per Zettabyte of data read.  ...  For comparison, we implement a straightforward End-to-End ZFS (E 2 ZFS) with the same protection scheme for all components.  ...  Overview The reliability of a storage system can be evaluated based on how likely corruption would occur. There are two types of corruption: detected and undetected (silent data corruption, SDC).  ... 
doi:10.1109/msst.2013.6558423 dblp:conf/mss/ZhangMAA13 fatcat:ky5jcx5bhjgcxch2axsk23yudm
« Previous Showing results 1 — 15 out of 13,975 results