Filters








4,467 Hits in 9.0 sec

A Survey of Linguistic Structures for Application-level Fault-Tolerance [article]

Vincenzo De Florio, Chris Blondia
2015 arXiv   pre-print
In this text we first define a "base" of structural attributes with which application-level fault-tolerance structures can be qualitatively assessed and compared with each other and with respect to the  ...  Structuring techniques answer the questions "How to incorporate fault-tolerance in the application layer of a computer program" and "How to manage the fault-tolerant code".  ...  ACKNOWLEDGMENT We would like to express our gratitude to the Editor for the many insightful remarks and suggestions.  ... 
arXiv:1504.03256v1 fatcat:u4mawglm3zh5jlu2i3aizcfzju

Performance Comparison of Retry and N-Copy Software Fault Tolerance Techniques

2020 International Journal of Emerging Trends in Engineering Research  
In this research article an attempt is made to compare performance of the following software fault tolerance techniques: 1) Retry Block (RtB) and 2) N-copy programming.  ...  There are different fault tolerance techniques which help software engineering to prevent software systems failure.  ...  It makes use of acceptance test and backward recovery approach to achieve fault tolerance.  ... 
doi:10.30534/ijeter/2020/42892020 fatcat:lishjnoa2zca3dlhbzmdzypxuq

A Systematic Review of Fault Tolerance in Mobile Agents

Bassey Echeng Isong
2013 American Journal of Software Engineering and Applications  
The implication of the study is to give a clear direction to future researchers in this area for a better reliable and transparent fault tolerance in mobile agents.  ...  in mobile agent's fault tolerance approaches.  ...  Collaborative Agents: Here, three or more types of agents with designated responsibilities in the detection and recovery processes work together for achieving fault tolerance action with a clear division  ... 
doi:10.11648/j.ajsea.20130205.11 fatcat:a34pch6a6jgd5fcucxgaup5lge

A multi-level view of dependable computing

Behrooz Parhami
1994 Computers & electrical engineering  
This paper serves a dual purpose. It presents a unified framework and terminology for the study of computer system dependability.  ...  If a fault is actually exercised, it may contaminate the data flowing within the system, causing errors.  ...  Acknowledgement--The research reported in this paper was initiated in 1987 at the University of Waterloo, where the author was a Visiting Professor supported in part by the Natural Sciences and Engineering  ... 
doi:10.1016/0045-7906(94)90048-5 fatcat:q3fwsc6eivab3l47wx6vtv6vtq

Algorithm-based fault tolerance for dense matrix factorizations

Peng Du, Aurelien Bouteiller, George Bosilca, Thomas Herault, Jack Dongarra
2012 SIGPLAN notices  
The fault-tolerant algorithms derived from this hybrid solution is applicable to a wide range of dense matrix factorizations, with minor modifications.  ...  This paper proposes a new hybrid approach, based on Algorithm-Based Fault Tolerance (ABFT), to help matrix factorizations algorithms survive fail-stop failures.  ...  The performance of both the original and fault tolerant version is reported in Tflop/s. This experiment is carried out to test the weak scalability where both the matrix and grid dimension doubles.  ... 
doi:10.1145/2370036.2145845 fatcat:aj5ivcrf25f4reulnpy6cpcdiy

Algorithm-based fault tolerance for dense matrix factorizations

Peng Du, Aurelien Bouteiller, George Bosilca, Thomas Herault, Jack Dongarra
2012 Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming - PPoPP '12  
The fault-tolerant algorithms derived from this hybrid solution is applicable to a wide range of dense matrix factorizations, with minor modifications.  ...  This paper proposes a new hybrid approach, based on Algorithm-Based Fault Tolerance (ABFT), to help matrix factorizations algorithms survive fail-stop failures.  ...  The performance of both the original and fault tolerant version is reported in Tflop/s. This experiment is carried out to test the weak scalability where both the matrix and grid dimension doubles.  ... 
doi:10.1145/2145816.2145845 dblp:conf/ppopp/DuBBHD12 fatcat:cyc73fwdtvhhve7gjzi6vyc7ne

Handling Software Faults with Redundancy [chapter]

Antonio Carzaniga, Alessandra Gorla, Mauro Pezzè
2009 Lecture Notes in Computer Science  
Software engineering methods can increase the dependability of software systems, and yet some faults escape even the most rigorous and methodical development process.  ...  In this chapter, we focus on software techniques to handle software faults, and we survey several such techniques developed in the area of fault tolerance and more recently in the area of autonomic computing  ...  In fact, recovery blocks detect component failures by executing explicitly-designed acceptance tests.  ... 
doi:10.1007/978-3-642-10248-6_7 fatcat:py277cbonvcypf6caujd7o5soe

A Scalable and Extensible Checkpointing Scheme for Massively Parallel Simulations [article]

Nils Kohl, Johannes Hötzer, Florian Schornbaum, Martin Bauer, Christian Godenschwager, Harald Köstler, Britta Nestler, Ulrich Rüde
2018 arXiv   pre-print
We demonstrate the efficiency and robustness of the method with a realistic phase-field simulation originating in the material sciences and with a lattice Boltzmann method implementation.  ...  To recover from a diskless checkpoint during runtime, we realize the recovery algorithms using ULFM MPI.  ...  The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V.  ... 
arXiv:1708.08286v2 fatcat:bkir7z4p5nfcjkkjbz3onfw6fe

Algorithm-Based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy

Aurelien Bouteiller, Thomas Herault, George Bosilca, Peng Du, Jack Dongarra
2015 ACM Transactions on Parallel Computing  
The fault-tolerant algorithms derived from this hybrid solution is applicable to a wide range of dense matrix factorizations, with minor modifications.  ...  This paper proposes a new hybrid approach, based on Algorithm-Based Fault Tolerance (ABFT), to help matrix factorizations algorithms survive fail-stop failures.  ...  Large scale experimental results validate the design of the proposed fault tolerance method by highlighting a decreasing overhead for both LU and QR, and thus a highly scalable approach.  ... 
doi:10.1145/2686892 fatcat:yu4orwb2uncgxbdzyrs6xw5z2e

CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance

2018 IEEE Transactions on Parallel and Distributed Systems  
This work presents the implementation of our C++ based library CRAFT (Checkpoint-Restart and Automatic Fault Tolerance), which serves two purposes.  ...  recovery mechanism.  ...  for Exascale) [48] .  ... 
doi:10.1109/tpds.2018.2866794 fatcat:exthchqwnnf5npli7jchz4jm7u

Toward systematic design of fault-tolerant systems

A. Avizienis
1997 Computer  
After 30 years of study and practice in fault tolerance, high-confidence computing remains a costly privilege of several critical applications.  ...  Work with my friends in the Computer Society's Technical Committee on Fault-Tolerant Computing and the IFIP Working Group 10.4 on Dependable Computing and Fault Tolerance has been rewarding and a source  ...  of inspiration for this article.  ... 
doi:10.1109/2.585154 fatcat:z5wsbzr3frf7reeug2ce6kdqqy

Application-layer Fault-Tolerance Protocols [article]

Vincenzo De Florio
2016 arXiv   pre-print
The central topic of this book is application-level fault-tolerance, that is the methods, architectures, and tools that allow to express a fault-tolerant system in the application software of our computers  ...  Application-level fault-tolerance is a sub-class of software fault-tolerance that focuses on the problems of expressing the problems and solutions of fault-tolerance in the top layer of the hierarchy of  ...  Let us denote with R the reliability of the basic component of the system, i.e., exp −λt , and R tmr as the reliability of the TMR system based on the same component.  ... 
arXiv:1611.02273v1 fatcat:my2uj2n2hrf4ljzpmbh4zlk57q

Enhancing fault tolerance of autonomous mobile robots

D. Crestani, K. Godary-Dejean, L. Lapierre
2015 Robotics and Autonomous Systems  
Even if some research studies exist, there is a lack of a global approach that really integrates dependability and particularly fault tolerance into the mobile robot design.  ...  This paper presents an approach that aims to integrate fault tolerance principles into the design of a robot real-time control architecture.  ...  The experimental tests for fault tolerance were conducted using HIL (Hardware In the Loop).  ... 
doi:10.1016/j.robot.2014.12.015 fatcat:3xfp3ommo5havnlycphm7rwrdy

Survey and future directions of fault-tolerant distributed computing on board spacecraft

Muhammad Fayyaz, Tanya Vladimirova
2016 Advances in Space Research  
Middleware for Fault-Tolerance (AMFT).  ...  Furthermore, there is no suitable method of assessing performance for such a scheme.  ...  If a subprogram passes the acceptance test written as an assertion, it proceeds to the next subprogram. Otherwise, failure of the acceptance test is an indication of a fault.  ... 
doi:10.1016/j.asr.2016.08.017 fatcat:szoac6aiwvbs3d2dyh5smxsgqa

An adaptive approach to achieving hardware and software fault tolerance in a distributed computing environment

A. Bondavalli, S. Chiaradonna, F. Di Giandomenico, J. Xu
2002 Journal of systems architecture  
Several hybrid-fault-tolerant architectures are identified and proposed.  ...  A method is introduced for evaluating the proposed architectures with respect to reliability, resource utilisation and response time. Examples of quantitative evaluations are also given.  ...  For a given fault-tolerant application, an architecture contains i) a set of software variants designed independently (mainly for coping with residual design faults), ii) an adjudicator [2] (e.g. an acceptance  ... 
doi:10.1016/s1383-7621(01)00029-7 fatcat:s5ou7igesfbu7kgz3zdf6v3pqm
« Previous Showing results 1 — 15 out of 4,467 results