Filters








13,146 Hits in 4.6 sec

A Failure Detector for Crash Recovery Systems in Cloud

Bharati Sinha, National Institute of Technology, Kurukshetra, India, 136119, Awadhesh Kumar Singh, Poonam Saini
2019 International Journal of Information Technology and Computer Science  
The paper proposes a failure detector to handle crash recoverable nodes and the system recovery is performed by a designated checkpoint in the event of failure.  ...  Therefore, fault detection and recovery is gaining attention in cloud research community. The Failure Detectors (FDs) are modules employed at the nodes to perform fault detection.  ...  The algorithm considers variable timeout rather than fixed one. The failure detectors have also been proposed for crash recovery systems by Aguilera et al. [10] .  ... 
doi:10.5815/ijitcs.2019.07.02 fatcat:24kyjp6l5raulbnqktbb4aqi7q

Quality of Service of an Asynchronous Crash-Recovery Leader Election Algorithm [article]

Vinícius A. Reis, Gustavo M. D. Vieira
2017 arXiv   pre-print
This paper presents and analyzes the behavior of a new leader election algorithm named NFD-L for the asynchronous crash-recovery failure model that is efficient in terms of its use of stable memory and  ...  In asynchronous distributed systems it is very hard to assess if one of the processes taking part in a computation is operating correctly or has failed.  ...  Acknowledgments We would like to thank Priscila Aiko Someda Dias for her valuable support during the statistical analysis of this work.  ... 
arXiv:1704.06302v1 fatcat:bwns4gmvrjan3akumhwfojdp4e

XNET: a reliable content-based publish/subscribe system

R. Chand, P. Felber
2004 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems, 2004.  
We analyze the efficiency of our techniques in a large scale experimental deployment on the PlanetLab testbed.  ...  A major challenge of such middleware infrastructures is their reliability and their ability to cope with failures in the system.  ...  Recovery delay for router 19 (1 min crash) Recovery delay for router 19 (5 min crash) Recovery delay for router 19 (10 min crash)Recovery delay for router 2 (1 min crash) Recovery delay for router 2 (5  ... 
doi:10.1109/reldis.2004.1353027 dblp:conf/srds/ChandF04 fatcat:cn5vwo54mfbwhigzf3qy6bojlm

Consensus in Asynchronous Distributed Systems: A Concise Guided Tour [chapter]

Rachid Guerraoui, Michel Hurfinn, Achour Mostefaoui, Riucarlos Oliveira, Michel Raynal, Andre Schiper
2000 Lecture Notes in Computer Science  
It studies Consensus in two failure models, namely, the Crash/no Recovery model and the Crash/Recovery model.  ...  It is now recognized that the Consensus problem is a fundamental problem when one has to design and implement reliable asynchronous distributed systems. This chapter is on the Consensus problem.  ...  Algorithms An algorithm for solving Consensus in the Crash/Recovery model without requiring stable storage has been proposed in [1] .  ... 
doi:10.1007/3-540-46475-1_2 fatcat:2mcrbzsrv5cejcfqvmbvvnj25a

Study of various Election algorithms on the basis of messagepassing approach

Pooja B. Raval
2012 IOSR Journal of Computer Engineering  
An important challenge in distributed systems is the adoption of suitable and efficient algorithms for coordination selection.  ...  of various coordinator selection algorithms in distributed systems.  ...  C.Modified Bully Algorithm [6] Modified Bully algorithm, an efficient version Bully algorithm to minimize redundancy in electing the coordinator and to reduce the recovery problem of a crashed process.  ... 
doi:10.9790/0661/0812327 fatcat:uws6kslsdfbszfdydfhf5hnga4

You Only Live Multiple Times: A Blackbox Solution for Reusing Crash-Stop Algorithms In Realistic Crash-Recovery Settings

David Kozhaya, Ognjen Maric, Yvonne-Anne Pignolet, Michael Wagner
2018 International Conference on Principles of Distributed Systems  
Using this transformation, many algorithms written for the asynchronous crash-stop model run correctly and unchanged in real systems.  ...  Distributed agreement-based algorithms are often specified in a crash-stop asynchronous model augmented by Chandra and Toueg's unreliable failure detectors.  ...  We showed how and under which conditions we can reuse existing crash-stop distributed algorithms in our crash-recovery systems.  ... 
doi:10.4230/lipics.opodis.2018.19 dblp:conf/opodis/KozhayaMP18 fatcat:bf6glsurpbd3xnl3a5tudg4whe

A Relaxed-Ring for Self-Organising and Fault-Tolerant Peer-to-Peer Networks

Boris Mej´yas, Peter Van Roy
2007 Chilean Computer Science Society (SCCC), Proceedings of the International Conference of the  
There is no doubt about the increase in popularity of decentralised systems over the classical client-server architecture in distributed applications.  ...  By increasing self-management in the system we are able to deal with these issues. We model ring maintenance as a selforganising and self-healing system using feedback loops.  ...  Section 4.3 gives more details about the recovery algorithms. Initially, we do not use extra fingers for recovery because it is not efficient.  ... 
doi:10.1109/sccc.2007.4396973 fatcat:jpd7627qfbacdg6hqe6vd2j5ne

Replication-Based Fault-Tolerance for Large-Scale Graph Processing

Peng Wang, Kaiyuan Zhang, Rong Chen, Haibo Chen, Haibing Guan
2014 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks  
This paper observes that the vertex replicas created for distributed graph computation can be naturally extended for fast in-memory recovery of graph states.  ...  Unfortunately, existing large-scale graph-parallel systems usually adopt a distributed checkpoint mechanism for fault tolerance, which incurs not only notable performance overhead but also lengthy recovery  ...  Acknowledgment We thank the anonymous reviewers for their insightful comments. This work is supported in part by Doctoral  ... 
doi:10.1109/dsn.2014.58 dblp:conf/dsn/WangZCCG14 fatcat:vfuicg3rqrf3lbivizr52ggwr4

A Relaxed-Ring for Self-Organising and Fault-Tolerant Peer-to-Peer Networks

Boris Mej¿yas, Peter Van Roy
2007 XXVI International Conference of the Chilean Society of Computer Science (SCCC'07)  
There is no doubt about the increase in popularity of decentralised systems over the classical client-server architecture in distributed applications.  ...  By increasing self-management in the system we are able to deal with these issues. We model ring maintenance as a selforganising and self-healing system using feedback loops.  ...  Section 4.3 gives more details about the recovery algorithms. Initially, we do not use extra fingers for recovery because it is not efficient.  ... 
doi:10.1109/sccc.2007.15 dblp:conf/sccc/BrichauRM07 fatcat:y4f4vspz4zhmdgxg2n3hev5rg4

Modified Bully Election Algorithm for Crash Recovery in Distributed Systems

2017 International Journal of Science and Research (IJSR)  
In this paper, we are compared base and efficient version of bully algorithm to minimize the number of messages during the election and when a process recovers from a crashed state in distributed systems  ...  Therefore, election algorithms are very important in any distributed systems.  ...  This research tries to reduce network traffic present in distributed systems during leader election and process recovery.  ... 
doi:10.21275/art20178852 fatcat:m4fnhgct3nemborjeqlofhw2re

A Hybrid Fault Tolerance System for Distributed Environment using Check Point Mechanism and Replication

S. Veera, S. Gavaskar, A. Sumithra
2017 International Journal of Computer Applications  
The efficiency of the algorithm depends on how much replication is done and upto what extent the fault tolerance has been achieved.  ...  We have here proposed a new method which uses both check point as well as the replication to ensure consistency in the distributed environment. Our method is also easy to implement.  ...  In case of fault, the most important issue is efficient recovery in dynamic heterogeneous systems. Recovery under different numbers of processors is highly desirable.  ... 
doi:10.5120/ijca2017912614 fatcat:ze5kjjc2wffm7ecyzjhidrvnfi

A Comprehensive Study on Failure Detectors of Distributed Systems

Bhavana Chaurasia, Anshul Verma
2020 Journal of scientific research  
In distributed systems, failure detectors are used to monitor the processes and to reduce the risk of failures by detecting them before system crashes.  ...  The paper helps readers for the enhancement of knowledge about the basics of failure detectors and the different algorithms which are developed to solve the failure detection problems of distributed systems  ...  Crash, crash-recovery, general omission, and byzantine are some types of process failures found in a distributed system.  ... 
doi:10.37398/jsr.2020.640235 fatcat:znckxyrnnnf3npesjkjtifjde4

A simple and communication-efficient Omega algorithm in the crash-recovery model

Cristian Martín, Mikel Larrea
2010 Information Processing Letters  
This paper presents a new algorithm implementing the Omega failure detector in the crash-recovery model.  ...  Since stable storage is not used to keep the identity of the leader in order to read it upon recovery, unstable processes, i.e., those that crash and recover infinitely often, output a special ⊥ value  ...  Fig. 1 . 1 Communication-efficient Omega algorithm in the crash-recovery model. in order to monitor its current leader.  ... 
doi:10.1016/j.ipl.2009.10.011 fatcat:iykjppsyozacfepggstgkhn5ki

Study on Election Algorithm in Distributed System

Hetal Katwala
2012 IOSR Journal of Computer Engineering  
In distributed system, electing a leader for the various coordination activities is an important issue. This paper present different election algorithm with different approach.  ...  This paper proposes a comparative analysis of the various election algorithms in distributed system.  ...  Existing Algorithms Many algorithms have been proposed for electing leaders in distributed systems 1. Bully algorithm proposed by Hector Garcia Molina in 1982. 2.  ... 
doi:10.9790/0661-0763439 fatcat:y5ot2selhve7vnoewtpttfxwye

Optimized Bully Algorithm

Sathesh B.M
2015 International Journal of Computer Applications  
Bully Algorithm by Garcia-Molina is a classic algorithm for leader election in a distributed system.  ...  The end result is a modified election bully algorithm which is much efficient than the existing leader election algorithms used in a distributed environment.  ...  Elections in Distributed Systems In a distributed system, when the leader is crashed, other nodes must elect another leader.  ... 
doi:10.5120/21641-4971 fatcat:577sgcmnpjfkdnt5gjqw2igcla
« Previous Showing results 1 — 15 out of 13,146 results