5,286 Hits in 3.5 sec

Reliable and total order broadcast in the crash-recovery model

Romain Boichat, Rachid Guerraoui
2005 Journal of Parallel and Distributed Computing  
Devising crash-recovery resilient broadcast primitives goes first through providing meaningful variants of those properties in a crash-recovery model.  ...  In particular, many papers have been published on algorithms that implement reliable and total order broadcast primitives in a crash-stop system model [10, 14, 3, 13, 5, 7] .  ...  model and one would wonder whether the specifications and algorithms devised in the Byzantine model could be used in a crash-recovery model.  ... 
doi:10.1016/j.jpdc.2004.10.008 fatcat:dwjr54hhb5g6pfqezb2looqxta

A new look at atomic broadcast in the asynchronous crash-recovery model

S. Mena, A. Schiper
24th IEEE Symposium on Reliable Distributed Systems (SRDS'05)  
The paper also proposes a new specification of atomic broadcast in the crash-recovery model that addresses these issues.  ...  This allows them to recover a previous state after a crash. However, the existing specifications of atomic broadcast in the crash-recovery model are not satisfactory, and the paper explains why.  ...  We also explain why atomic broadcast in the crash-recovery model is trickier than in the crash-stop model. Atomic broadcast is most of the time used within an application that has a state.  ... 
doi:10.1109/reldis.2005.6 dblp:conf/srds/MenaS05 fatcat:dtrzjpqa4fb7tdsbxhjwvcqfqy

Advances in the Design and Implementation of Group Communication Middleware [chapter]

Daniel Bünzli, Rachele Fuzzati, Sergio Mena, Uwe Nestmann, Olivier Rütti, André Schiper, Paweł T. Wojciechowski
2006 Lecture Notes in Computer Science  
Group communication is a programming abstraction that allows a distributed group of processes to provide a reliable service in spite of the possibility of failures within the group.  ...  The goal of the project was to improve the state of the art of group communication in several directions: protocol frameworks, group communication stacks, specification, verification and robustness.  ...  Atomic broadcast in the crash-recovery model has been considered in [18] .  ... 
doi:10.1007/11808107_8 fatcat:zy6loymyyje5fbun5zhwghru7q


Michael L. Powell, David L. Presotto
1983 Proceedings of the ninth ACM symposium on Operating systems principles - SOSP '83  
Publishing is a model and mechanism for crash recovery in a distributed computing environment.  ...  The prototype version implemented in DEMOS/MP demonstrates that an error recovery can be transparent to user processes and can be centralized in the network.  ...  Messages are assumed to be delivered when they are broadcast, so the receiving nodes do not appear in the model.  ... 
doi:10.1145/800217.806618 dblp:conf/sosp/PowellP83 fatcat:bc3tdxixxzejjdzpv2bszrit4e

Group Communication: From Practice to Theory [chapter]

André Schiper
2006 Lecture Notes in Computer Science  
In this context, the paper surveys techniques that allow to achieve fault tolerance in distributed systems by replication. The main replication techniques are first explained.  ...  Then group communication is introduced as the communication infrastructure that allows the implementation of the different replication techniques.  ...  I would like to thank Sergio Mena and Olivier Rütti for their comments on an earlier version of the paper.  ... 
doi:10.1007/11611257_10 fatcat:cnjwpr7iffdfhp7pastswja6gq

Dependable Systems [chapter]

André Schiper
2006 Lecture Notes in Computer Science  
In this context, the paper surveys techniques that allow to achieve fault tolerance in distributed systems by replication. The main replication techniques are first explained.  ...  Almost the same paper appears under the title Group Communication: from practice to theory in  ...  I would like to thank Sergio Mena and Olivier Rütti for their comments on an earlier version of the paper.  ... 
doi:10.1007/11808107_2 fatcat:otlxfarnp5ekthg6xtsfwro7du

Supporting amnesia in log-based recovery protocols

Rubén de Juan-Marín, Luis Irún-Briz, Francesc D. Muñoz-Escoí
2007 Proceedings of the 2007 Euro American conference on Telematics and information systems - EATIS '07  
This paper points out how the crash-recovery with partial amnesia failure model presents a better accuracy for replicated systems with huge state, but how its use has the amnesia phenomenon drawback.  ...  Then, the paper analyzes this phenomenon and how to deal with it in a basic configuration using a log-based recovery approach.  ...  In other words, to use the crash-recovery with partial amnesia failure model.  ... 
doi:10.1145/1352694.1352720 dblp:conf/eatis/Juan-MarinIM07 fatcat:ftk2tfrkozee7dzcfheeadrveq

Reviewing Amnesia Support in Database Recovery Protocols [chapter]

Rubén de Juan-Marín, Luis H. García-Muñoz, J. Enrique Armendáriz-Íñigo, Francesc D. Muñoz-Escoí
2007 Lecture Notes in Computer Science  
Replicated databases literature last trends consist in adopting the crash-recovery with partial amnesia failure model because in most cases it shortens the recovery times.  ...  An important aspect when designing these systems is the assumed failure model.  ...  The most commonly used failure models in replicated databases are fail-stop and crash-recovery with partial amnesia, as defined in [10] .  ... 
doi:10.1007/978-3-540-76848-7_48 fatcat:cxkotz4czze6ddmzhv7j3w6v4y

Beyond 1-Safety and 2-Safety for Replicated Databases: Group-Safety [chapter]

Matthias Wiesmann, André Schiper
2004 Lecture Notes in Computer Science  
In this paper, we study the safety guarantees of group communication-based database replication techniques.  ...  We propose a new group communication primitive called end-to-end atomic broadcast that solves the problem, i.e., can be used to implement 2-safe database replication.  ...  Dynamic crash no-recovery model The dynamic crash no-recovery model has been introduced in the Isis system [5] , and is also sometimes called the view based model.  ... 
doi:10.1007/978-3-540-24741-8_11 fatcat:2phjbaidovct7pbv7acodotmca

A single-phase non-blocking atomic commitment protocol [chapter]

Maha Abdallah, Philippe Pucheral
1998 Lecture Notes in Computer Science  
Transactional standards have been promoted by OMG and X/Open to allow heterogeneous resources to participate in an Atomic Commitment Protocol (ACP), namely the two-phase commit protocol (2PC).  ...  This protocol relies on the assumption that all participants are ruled by a rigorous concurrency control protocol.  ...  In such a model, site failures can be reliably detected by any reliable failure detector and reported to any operational site.  ... 
doi:10.1007/bfb0054516 fatcat:n5ck764fn5brhen4p3xkvurgkm

Page 214 of IEEE Transactions on Computers Vol. 52, Issue 2 [page]

2003 IEEE Transactions on Computers  
Agent a> thereby always uses a model of recovery, where no partial state survives a crash.  ...  In summary, the asynchrony assumption thus forces us indirectly to support the recovery of agents after a crash 5.2 Building Block 2: Reliably Forwarding the Agent between S, and S Having solved the problem  ... 

Consensus in Asynchronous Distributed Systems: A Concise Guided Tour [chapter]

Rachid Guerraoui, Michel Hurfinn, Achour Mostefaoui, Riucarlos Oliveira, Michel Raynal, Andre Schiper
2000 Lecture Notes in Computer Science  
It studies Consensus in two failure models, namely, the Crash/no Recovery model and the Crash/Recovery model.  ...  The assumptions related to the detection of failures that are required to solve Consensus in a given model are particularly emphasized.  ...  Section 3 and Section 6 provide instantiations of what is a good/bad process, in the Crash/no Recovery model and in the Crash/Recovery model, respectively.  ... 
doi:10.1007/3-540-46475-1_2 fatcat:2mcrbzsrv5cejcfqvmbvvnj25a

Replication of Recovery Log — An Approach to Enhance SOA Reliability [chapter]

Anna Kobusińska, Dariusz Wawrzyniak
2015 Lecture Notes in Computer Science  
In this paper we propose to enhance the resilience of ReServE by replication of log with recovery information, and address problems related to deployment of this solution.  ...  To improve reliability of SOA-based systems and applications, a ReServE service, providing an external support of web services recovery, has been designed.  ...  Additionally, the crash-recovery model of failures is assumed, i.e., system components may fail and recover after crashing a nite number of times [1] .  ... 
doi:10.1007/978-3-319-19129-4_12 fatcat:lqvfmrto75hclj4j63wokemfx4

FTRepMI: Fault-Tolerant, Sequentially-Consistent Object Replication for Grid Applications [chapter]

Ana-Maria Oprescu, Thilo Kielmann, Wan Fokkink
2008 Lecture Notes in Computer Science  
FTRepMI supports dynamic joins and graceful leaves of processes holding a replica, as well as failstop crashes.  ...  We introduce FTRepMI, a simple fault-tolerant protocol for providing sequential consistency amongst replicated objects in a grid, without using any centralized components.  ...  We thank Niels Drost and Rena Bakhshi for their helpful comments, and Stefan Blom for his help with the μCRL model checking exercise.  ... 
doi:10.1007/978-3-540-92295-7_44 fatcat:3l5ouqffybbdrhecgp6dpposqy

A Comprehensive Study on Failure Detectors of Distributed Systems

Bhavana Chaurasia, Anshul Verma
2020 Journal of scientific research  
In distributed systems, failure detectors are used to monitor the processes and to reduce the risk of failures by detecting them before system crashes.  ...  In this paper various failure detector algorithms are discussed.  ...  The crash-recovery model is an advanced version of the crash failure model in which a crashed process can be recovered.  ... 
doi:10.37398/jsr.2020.640235 fatcat:znckxyrnnnf3npesjkjtifjde4
« Previous Showing results 1 — 15 out of 5,286 results