A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit the original URL.
The file type is application/pdf
.
Filters
Improving availability with recursive microreboots: a soft-state system case study
2003
Performance evaluation (Print)
The rest of this paper describes a recursive approach to recovering large systems (section 2), followed by a case study in which we applied microreboots to a recursively recoverable satellite ground station ...
From our experience with Mercury, we draw design guidelines and lessons for the application of recursive microreboots to other software systems. ...
Faculty Award, and the USENIX Society through a student scholarship. ...
doi:10.1016/s0166-5316(03)00136-6
fatcat:6k535oiu3neofpvphgulboxdjq
Improving availability with recursive microreboots: a soft-state system case study
2004
Performance evaluation (Print)
The rest of this paper describes a recursive approach to recovering large systems (section 2), followed by a case study in which we applied microreboots to a recursively recoverable satellite ground station ...
From our experience with Mercury, we draw design guidelines and lessons for the application of recursive microreboots to other software systems. ...
Faculty Award, and the USENIX Society through a student scholarship. ...
doi:10.1016/j.peva.2003.07.007
fatcat:nyi64kprgjdmhfmrl2mnndrz3q
End-User Effects of Microreboots in Three-Tiered Internet Systems
[article]
2004
arXiv
pre-print
improves availability. ...
Microreboots restart fine-grained components of software systems "with a clean slate," and only take a fraction of the time needed for full system reboot. ...
The current release of our software is available for download at http://crash.stanford.edu/download; future versions will be posted there as well. ...
arXiv:cs/0403007v1
fatcat:jzevfnb55rhzfndgvqydisl4hq
A framework for evaluating quality-driven self-adaptive software systems
2011
Proceeding of the 6th international symposium on Software engineering for adaptive and self-managing systems - SEAMS '11
Over the past decade the dynamic capabilities of self-adaptive software-intensive systems have proliferated and improved significantly. ...
Our framework is based on a survey of self-adaptive system papers and a set of adaptation properties derived from control theory properties. ...
In the self-healing approach based on recursive microreboots proposed by Candea et al., availability is evaluated in terms of mean time to recover (M T T R) [4] . ...
doi:10.1145/1988008.1988020
dblp:conf/icse/VillegasMTDC11
fatcat:ru7z6b2fzrbi7etxlrf4uvqcra
Assessment and Improvement of Hang Detection in the Linux Operating System
2009
2009 28th IEEE International Symposium on Reliable Distributed Systems
supported by a field data study on the Linux OS. ...
Using the proposed fault injection framework, along with realistic workloads, we find that the Linux OS is unable to detect hangs in several cases. We experience a relative coverage of 75%. ...
This may happen in the case of recursive functions, and in the case of two functions calling each other, which use the same lock. 2) A set of two or more locks is improperly managed (20 comments
C ...
doi:10.1109/srds.2009.26
dblp:conf/srds/CotroneoNR09
fatcat:4imj6vrvyvcfpjeik4obfc2w3e
Execution transactions for defending against software failures: use and evaluation
2006
International Journal of Information Security
Our performance benchmarks indicate a slow-down of 20% for Apache in full-protection mode, and 1.2% with selective protection. ...
We combine our defensive mechanism with a honeypot-like configuration to detect previously unknown attacks, automatically adapt an application's defensive posture at a negligible performance cost, and ...
are instrumented with our system (for example, a worst-case microbenchmark measurement indicates a 440% slowdown). ...
doi:10.1007/s10207-006-0083-6
fatcat:6wtsvxahznbill35gog3pjm4lm
A survey on self-healing systems: approaches and systems
2010
Computing
This fostered substantial research on designs and techniques that enhance these systems with an autonomous behavior. ...
In a final discussion, we summarize the approaches' common and individual characteristics. A comprehensive tabular overview of the researched material concludes the survey. ...
The recovery by rerun scheme is described as recursive microreboots that reboot units recursively according to the dependencies hold in the referencing reboot tree. ...
doi:10.1007/s00607-010-0107-y
fatcat:bu6kxztswferrgelipkj4txlie
Error-Efficient Computing Systems
2017
Foundations and Trends® in Electronic Design Automation
The resource may also be energy: A system may use less power from its batteries or from the electrical grid by only avoiding certain errors while tolerating benign errors that are associated with reduced ...
The resource in question may be an even more abstract quantity such as consistency of ordering of the outputs of a system. ...
When Lax is integrated into a sophisticated operating system, such changes might still suffice, or may be augmented with, e.g., techniques such as microreboots [Candea et al., 2004] or tools such as ...
doi:10.1561/1000000049
fatcat:a5edy4lnerbnnb2jh2otle62dq
The 7U Evaluation Method: Evaluating Software Systems via Runtime Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models
2017
First, developing (or identifying) practical fault-injection tools that can be used to study the failure behavior of computing systems and exercise any (remediation) mechanisms the system has available ...
(self- managing/self-*) systems, which are expected to meet these non-functional requirements with minimal human intervention. ...
Microreboot RAS Model Recursive microreboots are a technique for improving overall system availability by reactively restarting failed components and rejuvenating functioning components to prevent degradation ...
doi:10.7916/d8r2187c
fatcat:q5fk5u7aercs7jt5etagaxyufy
Aspect-oriented technology for dependable operating systems
2017
To evaluate AOP as a means to improve the dependability of operating systems, this thesis presents the design and implementation of a library of aspect-oriented fault-tolerance mechanisms. ...
Therefore, dependable computer systems must incorporate methods of fault tolerance to cope with transient faults. ...
case study: hardening l4/fiasco.oc The second case study applies the dependability aspects to the operating system L4/Fiasco.OC, which represents a state-of-the-art microkernel as described in Section ...
doi:10.17877/de290r-17995
fatcat:4lowsbiyx5ckjglqa3ena4zy4a