Filters








9,232 Hits in 3.8 sec

The fault span of crash failures

George Varghese, Mahesh Jayaram
2000 Journal of the ACM  
Overall, we show a strict hierarchy in terms of the set of states reachable by crash failures in the three link models.  ...  We further characterize the reachable states caused by crash failures using reliable non-FIFO and reliable FIFO links.  ...  The fault-span defines the power of crash failures-the larger the fault-span, the more dangerous the effect of crash failures.  ... 
doi:10.1145/333979.333982 fatcat:bwomlvf6zbh5jeaicjjaumrho4

Fault Tolerant Network Constructors [article]

Othon Michail, Paul G. Spirakis, Michail Theofilatos
2019 arXiv   pre-print
In particular, if an unbounded number of crash faults may occur, we prove that (i) the only constructible graph language is that of spanning cliques and (ii) a strong impossibility result holds even if  ...  In this work, we consider adversarial crash faults of nodes in the network constructors model [Michail and Spirakis, 2016].  ...  In order to form a spanning line under crash failures, the P component will be executing our FT Spanning Line protocol which is guaranteed to construct a line, spanning eventually the non-faulty nodes.  ... 
arXiv:1903.05992v2 fatcat:rngslsmiujdyzenz7l6i7ioqr4

An autonomic hierarchical reliable broadcast protocol for asynchronous distributed systems with failure detection

Denis Jeanneau, Luiz A. Rodrigues, Luciana Arantes, Elias P. Duarte
2017 Journal of the Brazilian Computer Society  
failures of processes.  ...  We consider that processes can fail by crashing, do not recover, and faults are eventually detected by all correct processes.  ...  Availability of data and materials Not applicable.  ... 
doi:10.1186/s13173-017-0064-9 fatcat:qr3rvlickbh4npyf6fcggozx4u

Fault Resilience of Structured P2P Systems [chapter]

Zhiyu Liu, Guihai Chen, Chunfeng Yuan, Sanglu Lu, Chengzhong Xu
2004 Lecture Notes in Computer Science  
This paper analyzes the performance of Chord [7] and Koorde [2] , and find out the crash point of each network through the simulation experiment.  ...  A fundamental problem that confronts structured peer-topeer system that use DHT technologies to map data onto nodes is the performance of the network under the circumstance that a large percentage of nodes  ...  By our definition, the crash point of Koorde is 55%, while the crash point of Chord is about 70%. And in Koorde, when 15% nodes fails, failure start to emerge.  ... 
doi:10.1007/978-3-540-30480-7_77 fatcat:zn2t37lawbe75gwisis6ovdmo4

A scalable double in-memory checkpoint and restart scheme towards exascale

Gengbin Zheng, Xiang Ni, Laxmikant V. Kale
2012 IEEE/IFIP International Conference on Dependable Systems and Networks Workshops (DSN 2012)  
As the size of supercomputers increases, the probability of system failure grows substantially, posing an increasingly significant challenge for scalability.  ...  Checkpoint-based fault tolerance methods are effective approaches at dealing with faults. With these methods, the state of the entire parallel application is checkpointed to reliable storage.  ...  Instead of using a point-to-point implementation of a fault-aware barrier, using the spanning tree-based barrier reduces the time to checkpoint.  ... 
doi:10.1109/dsnw.2012.6264677 dblp:conf/dsn/ZhengNK12 fatcat:p56cp4bohzh7jli3rrfvtkb4sy

Tolerating SDN Application Failures with LegoSDN

Balakrishnan Chandrasekaran, Theophilus Benson
2014 Proceedings of the 13th ACM Workshop on Hot Topics in Networks - HotNets-XIII  
At the heart of these issues is a set of fate-sharing relationships: The first between the SDN-Apps and controllers, where-in the crash of the former induces a crash of the latter, and thereby affecting  ...  Among the issues that hamper SDN's adoption two stand out: reliability and fault tolerance.  ...  Handling failures that span multiple transactions: Currently, LegoSDN can easily overcome failure induced by the most recently processed event.  ... 
doi:10.1145/2670518.2673880 dblp:conf/hotnets/ChandrasekaranB14 fatcat:ssq3pw3gwrajtbmrzavctdvmei

Tolerating SDN application failures with LegoSDN

Balakrishnan Chandrasekaran, Theophilus Benson
2014 Proceedings of the third workshop on Hot topics in software defined networking - HotSDN '14  
At the heart of these issues is a set of fate-sharing relationships: The first between the SDN-Apps and controllers, where-in the crash of the former induces a crash of the latter, and thereby affecting  ...  Among the issues that hamper SDN's adoption two stand out: reliability and fault tolerance.  ...  Handling failures that span multiple transactions: Currently, LegoSDN can easily overcome failure induced by the most recently processed event.  ... 
doi:10.1145/2620728.2620781 dblp:conf/sigcomm/ChandrasekaranB14 fatcat:q3txpi42b5cqtcqtesvefs46ui

Bundling Messages to Reduce the Cost of Tree-Based Broadcast Algorithms

Luiz A. Rodrigues, Elias P. Duarte, Joao Paulo de Araujo, Luciana Arantes, Pierre Sens
2018 2018 Eighth Latin-American Symposium on Dependable Computing (LADC)  
Experimental results obtained with simulation are presented showing the performance to the algorithm in terms of the latency and the number and sizes of messages employed.  ...  The algorithm is autonomic in the sense that it employs dynamic trees rooted at the source process and which rebuild themselves after processes crash.  ...  In the first, a single process fails at the beginning of the simulation; in the second experiment failures were generated randomly during the simulation. Fault of a single process.  ... 
doi:10.1109/ladc.2018.00022 dblp:conf/ladc/RodriguesDAAS18 fatcat:y4yxjltiyrg6jlhmyivmezqn5m

Harmful dogmas in fault tolerant distributed computing

Bernadette Charron-Bost, André Schiper
2007 ACM SIGACT News  
It is hard to question these modelling choices, as they have gained the status of dogmas.  ...  Nevertheless, we propose a simpler and more natural approach that allows us to get rid of these dogmas, and to handle all types of benign fault, be it static or dynamic, permanent or transient, in a unified  ...  It should also be noted that the unification provided by the HO model must be seen from the perspective of constructing solutions to consensus that span the whole class of benign faults.  ... 
doi:10.1145/1233481.1233496 fatcat:z2movhmjzbccviwpyzwfdukx7i

Distributed fault tolerance: lessons from Delta-4

D. Powell
1994 IEEE Micro  
Acknowledgments The Commission of the European Community partial11 supported the Delta-4 project through the ESPIRIT program (Projects 818 and 2252).  ...  Special and very personal thanks must of course go to David, David, Doug, Gottfried. Marc. Pascal, Paulo. Peter, and Santosh, as well as all my compatriots in the Dependable Computing Group at LA.6  ...  Sometimes. however. a crash can remain undetected by the rest of the system if none of the AMP groups that span the crashed node are active-that is. attempting to exchange messages.  ... 
doi:10.1109/40.259898 fatcat:6435u4ycijetdhzbpguzsybmja

Dependability Characterization of Middleware Services [chapter]

Eric Marsden, Nicolas Perrot, Jean-Charles Fabre, Jean Arlat
2002 IFIP Advances in Information and Communication Technology  
We illustrate an approach for characterizing the dependability of middleware service implementations, with respect to corrupt method invocations arriving over the network.  ...  Integrators ofCORBA-based dependable systems require information on the robustness of candidate middleware implementations, in order to select the implementation that is best suited to their requirements  ...  This work is partially supported by the European Community (project IST-1999-11585: DSoS -Dependable Systems of Systems). 5.  ... 
doi:10.1007/978-0-387-35599-3_13 fatcat:jiiokedopnbynkf5whodpdgutm

A Resource Management System for Fault Tolerance in Grid Computing

HwaMin Lee, DooSoon Park, Min Hong, Sang-Soo Yeo, SooKyun Kim, SungHoon Kim
2009 2009 International Conference on Computational Science and Engineering  
In order to provide fault tolerance service and satisfy QoS requirements, we expand the definition of failure, such as process failure, processor failure, and network failure.  ...  Since the failure of resources affects job execution fatally, fault tolerance service is essential in computational grids.  ...  [Definition 1] Failure It is a failure if and only if one of the following two conditions is satisfied (but not both). 1. A resource service stops due to a resource crash (Crash failure) 2.  ... 
doi:10.1109/cse.2009.257 dblp:conf/cse/LeePHYKK09 fatcat:2z77rt5nyfgujcojgionbxvxhq

Page 5784 of Mathematical Reviews Vol. , Issue 2001H [page]

2001 Mathematical Reviews  
Louis, MO) The fault span of crash failures. (English summary) J. ACM 47 (2000), no. 2, 244-293.  ...  The authors characterize the fault spans of crashing protocols.  ... 

Dependability Assessment of the Android OS Through Fault Injection

Domenico Cotroneo, Antonio Ken Iannillo, Roberto Natella, Stefano Rosiello
2019 IEEE Transactions on Reliability  
In this paper, we study how to assess the impact of faults on the quality of user experience in the Android mobile OS through fault injection.  ...  We first address the problem of identifying a realistic fault model for the Android OS, by providing to developers a set of lightweight and systematic guidelines for fault modeling.  ...  ACKNOWLEDGMENTS This work has been partially supported by UniNA and Compagnia di San Paolo in the frame of Programme STAR, and by Huawei Technologies Co., Ltd.  ... 
doi:10.1109/tr.2019.2954384 fatcat:bdxmndyxjba5ncyngs2cwnnfwm

Crash failures can drive protocols to arbitrary states

Mahesh Jayaram, George Varghese
1996 Proceedings of the fifteenth annual ACM symposium on Principles of distributed computing - PODC '96  
A crashing network protocol is an asynchronous propamieeioa endhr fee.  ...  The asynchronous model is easiest to work with because it imposes the least restrictions. Fault-spans provide insight into failure modes of protocols.  ...  Our main theorem essentially states: Any crashing protocol that works in the CAiWL model can be driven ildti tiny possible protocol state. Thus the fault-span of the CAML model is very large.  ... 
doi:10.1145/248052.248104 dblp:conf/podc/JayaramV96 fatcat:ighlcnudk5cg5bbowhwjna2koe
« Previous Showing results 1 — 15 out of 9,232 results