An evaluation of the error detection mechanisms in MARS using software-implemented fault injection [chapter]

Emmerich Fuchs
1996 Lecture Notes in Computer Science  
The concept of fail-silent nodes greatly simpli es the design and safety proof of highly dependable fault-tolerant computer systems. The MAintainable Real-Time System (MARS) is a computer system where the hardware, operating system, and application level error detection mechanisms are designed to ensure the fail silence of nodes with a high probability. The goal of this paper is two-fold: First, the error detection capabilities of the di erent mechanisms are evaluated in software-implemented
more » ... lt injection experiments using the well-known bit-ip fault model. The results show that a fail silence coverage of at least 85% is achievable by the combination of hardware and system level software error detection mechanisms. With the additional use of application level error detection mechanisms a fail silence coverage of 100% was achieved. Second, the limits of the application level error detection mechanisms are evaluated. In these experiments, the fault model consists of highly improbable residual faults to deliberately force the occurrence of fail silence violations. Despite this worst-case scenario, more than 50% of the presumed undetectable errors were detected by other mechanisms and hence did not lead to fail silence violations.
doi:10.1007/3-540-61772-8_31 fatcat:vyvgkos3rfcblfujjgs2l56u3m