Assessing Asymmetric Fault-Tolerant Software

Peter Popov, Lorenzo Strigini
2010 2010 IEEE 21st International Symposium on Software Reliability Engineering  
The most popular forms of fault tolerance against design faults use "asymmetric" architectures in which a "primary" part performs the computation and a "secondary" part is in charge of detecting errors and performing some kind of error processing and recovery. In contrast, the most studied forms of software fault tolerance are "symmetric" ones, e.g. Nversion programming. The latter are often controversial, the former are not. We discuss how to assess the dependability gains achieved by these
more » ... hods. Substantial difficulties have been shown to exist for symmetric schemes, but we show that the same difficulties affect asymmetric schemes. Indeed, the latter present somewhat subtler problems. In both cases, to predict the dependability of the fault-tolerant system it is not enough to know the dependability of the individual components. We extend to asymmetric architectures the style of probabilistic modeling that has been useful for describing the dependability of "symmetric" architectures, to highlight factors that complicate the assessment. In the light of these models, we finally discuss fault injection approaches to estimating coverage factors. We highlight the limits of what can be predicted and some useful research directions towards clarifying and extending the range of situations in which estimates of coverage of fault tolerance mechanisms can be trusted.
doi:10.1109/issre.2010.10 dblp:conf/issre/PopovS10 fatcat:ssau5h2aafbztgzbzt6xij5loi