Filters








10 Hits in 3.7 sec

Improving availability with recursive microreboots: a soft-state system case study

G Candea
2003 Performance evaluation (Print)  
The rest of this paper describes a recursive approach to recovering large systems (section 2), followed by a case study in which we applied microreboots to a recursively recoverable satellite ground station  ...  From our experience with Mercury, we draw design guidelines and lessons for the application of recursive microreboots to other software systems.  ...  Faculty Award, and the USENIX Society through a student scholarship.  ... 
doi:10.1016/s0166-5316(03)00136-6 fatcat:6k535oiu3neofpvphgulboxdjq

Improving availability with recursive microreboots: a soft-state system case study

George Candea, James Cutler, Armando Fox
2004 Performance evaluation (Print)  
The rest of this paper describes a recursive approach to recovering large systems (section 2), followed by a case study in which we applied microreboots to a recursively recoverable satellite ground station  ...  From our experience with Mercury, we draw design guidelines and lessons for the application of recursive microreboots to other software systems.  ...  Faculty Award, and the USENIX Society through a student scholarship.  ... 
doi:10.1016/j.peva.2003.07.007 fatcat:nyi64kprgjdmhfmrl2mnndrz3q

End-User Effects of Microreboots in Three-Tiered Internet Systems [article]

George Candea, Armando Fox
2004 arXiv   pre-print
improves availability.  ...  Microreboots restart fine-grained components of software systems "with a clean slate," and only take a fraction of the time needed for full system reboot.  ...  The current release of our software is available for download at http://crash.stanford.edu/download; future versions will be posted there as well.  ... 
arXiv:cs/0403007v1 fatcat:jzevfnb55rhzfndgvqydisl4hq

A framework for evaluating quality-driven self-adaptive software systems

Norha M. Villegas, Hausi A. Müller, Gabriel Tamura, Laurence Duchien, Rubby Casallas
2011 Proceeding of the 6th international symposium on Software engineering for adaptive and self-managing systems - SEAMS '11  
Over the past decade the dynamic capabilities of self-adaptive software-intensive systems have proliferated and improved significantly.  ...  Our framework is based on a survey of self-adaptive system papers and a set of adaptation properties derived from control theory properties.  ...  In the self-healing approach based on recursive microreboots proposed by Candea et al., availability is evaluated in terms of mean time to recover (M T T R) [4] .  ... 
doi:10.1145/1988008.1988020 dblp:conf/icse/VillegasMTDC11 fatcat:ru7z6b2fzrbi7etxlrf4uvqcra

Assessment and Improvement of Hang Detection in the Linux Operating System

Domenico Cotroneo, Roberto Natella, Stefano Russo
2009 2009 28th IEEE International Symposium on Reliable Distributed Systems  
supported by a field data study on the Linux OS.  ...  Using the proposed fault injection framework, along with realistic workloads, we find that the Linux OS is unable to detect hangs in several cases. We experience a relative coverage of 75%.  ...  This may happen in the case of recursive functions, and in the case of two functions calling each other, which use the same lock. 2) A set of two or more locks is improperly managed (20 comments C  ... 
doi:10.1109/srds.2009.26 dblp:conf/srds/CotroneoNR09 fatcat:4imj6vrvyvcfpjeik4obfc2w3e

Execution transactions for defending against software failures: use and evaluation

Stelios Sidiroglou, Angelos D. Keromytis
2006 International Journal of Information Security  
Our performance benchmarks indicate a slow-down of 20% for Apache in full-protection mode, and 1.2% with selective protection.  ...  We combine our defensive mechanism with a honeypot-like configuration to detect previously unknown attacks, automatically adapt an application's defensive posture at a negligible performance cost, and  ...  are instrumented with our system (for example, a worst-case microbenchmark measurement indicates a 440% slowdown).  ... 
doi:10.1007/s10207-006-0083-6 fatcat:6wtsvxahznbill35gog3pjm4lm

A survey on self-healing systems: approaches and systems

Harald Psaier, Schahram Dustdar
2010 Computing  
This fostered substantial research on designs and techniques that enhance these systems with an autonomous behavior.  ...  In a final discussion, we summarize the approaches' common and individual characteristics. A comprehensive tabular overview of the researched material concludes the survey.  ...  The recovery by rerun scheme is described as recursive microreboots that reboot units recursively according to the dependencies hold in the referencing reboot tree.  ... 
doi:10.1007/s00607-010-0107-y fatcat:bu6kxztswferrgelipkj4txlie

Error-Efficient Computing Systems

Phillip Stanley-Marbell, Martin Rinard
2017 Foundations and Trends® in Electronic Design Automation  
The resource may also be energy: A system may use less power from its batteries or from the electrical grid by only avoiding certain errors while tolerating benign errors that are associated with reduced  ...  The resource in question may be an even more abstract quantity such as consistency of ordering of the outputs of a system.  ...  When Lax is integrated into a sophisticated operating system, such changes might still suffice, or may be augmented with, e.g., techniques such as microreboots [Candea et al., 2004] or tools such as  ... 
doi:10.1561/1000000049 fatcat:a5edy4lnerbnnb2jh2otle62dq

The 7U Evaluation Method: Evaluating Software Systems via Runtime Fault-Injection and Reliability, Availability and Serviceability (RAS) Metrics and Models

Rean Griffith, Columbia University. Computer Science
2017
First, developing (or identifying) practical fault-injection tools that can be used to study the failure behavior of computing systems and exercise any (remediation) mechanisms the system has available  ...  (self- managing/self-*) systems, which are expected to meet these non-functional requirements with minimal human intervention.  ...  Microreboot RAS Model Recursive microreboots are a technique for improving overall system availability by reactively restarting failed components and rejuvenating functioning components to prevent degradation  ... 
doi:10.7916/d8r2187c fatcat:q5fk5u7aercs7jt5etagaxyufy

Aspect-oriented technology for dependable operating systems

Christoph Borchert, Technische Universität Dortmund, Technische Universität Dortmund
2017
To evaluate AOP as a means to improve the dependability of operating systems, this thesis presents the design and implementation of a library of aspect-oriented fault-tolerance mechanisms.  ...  Therefore, dependable computer systems must incorporate methods of fault tolerance to cope with transient faults.  ...  case study: hardening l4/fiasco.oc The second case study applies the dependability aspects to the operating system L4/Fiasco.OC, which represents a state-of-the-art microkernel as described in Section  ... 
doi:10.17877/de290r-17995 fatcat:4lowsbiyx5ckjglqa3ena4zy4a