5,941 Hits in 11.8 sec

Separation and Equivalence results for the Crash-stop and Crash-recovery Shared Memory Models [article]

Ohad Ben-Baruch, Srivatsan Ravi
2020 arXiv   pre-print
Our first contribution formalizes the crash-recovery model and how explicit process crashes and recovery introduces further dimensionalities over the standard crash-stop shared memory model.  ...  This work formalizes and answers the question of whether an implementation of a data type derived for the crash-stop shared memory model is also strict-linearizable in the crash-recovery model.  ...  Our contributions establish equivalence and separation results for crash-stop and the identified crash-recovery models, thus providing a precise characterization of the intricacies in applying a concurrent  ... 
arXiv:2012.03692v1 fatcat:5oa62ytyf5g6rpomyk2i3lttrq

The collective memory of amnesic processes

Rachid Guerraoui, Ron R. Levy, Bastian Pochon, Jim Pugh
2008 ACM Transactions on Algorithms  
We revisit the notion of atomicity in the crash-recovery context and introduce a generic algorithm that emulates an atomic memory.  ...  This paper considers the problem of robustly emulating a shared atomic memory over a distributed message passing system where processes can fail by crashing and possibly recover.  ...  Acknowledgments We are very grateful to the reviewers for their significant help to improve the presentation of this paper and highlight fundamental assumptions underlying our algorithm.  ... 
doi:10.1145/1328911.1328923 fatcat:mcewbmotcveyngscozbidhmquy

Delay-Free Concurrency on Faulty Persistent Memory [article]

Naama Ben-David, Guy E. Blelloch, Michal Friedman, Yuanhao Wei
2020 arXiv   pre-print
In this paper, we present a construction that takes any concurrent program with reads, writes and CASs to shared memory and makes it persistent, i.e., can be continued after one or more processes fault  ...  Since caches are expected to remain volatile, concurrent data structures and algorithms must be redesigned to guarantee that they are left in a consistent state after a system crash, and that the execution  ...  Throughout the paper we will use private cache model to refer to the PPM model, and shared cache model for the shared cache variant.  ... 
arXiv:1806.04780v3 fatcat:bd2ccf6wdjenbdckm4jyyj742m

The Inherent Cost of Remembering Consistently

Nachshon Cohen, Rachid Guerraoui, Igor Zablotchi
2018 Proceedings of the 30th on Symposium on Parallelism in Algorithms and Architectures - SPAA '18  
Non-volatile memory (NVM) promises fast, byte-addressable and durable storage, with raw access latencies in the same order of magnitude as DRAM.  ...  But in order to take advantage of the durability of NVM, programmers need to design persistent objects which maintain consistent state across system crashes and restarts.  ...  ACKNOWLEDGMENTS We wish to thank the anonymous reviewers for their helpful comments on improving the paper.  ... 
doi:10.1145/3210377.3210400 dblp:conf/spaa/CohenGZ18 fatcat:meutuc3uubh2bfpedi22nty6mm

Transaction-Based Process Crash Recovery of File System Namespace Modules

David C. Van Moolenbroek, Raja Appuswamy, Andrew S. Tanenbaum
2013 2013 IEEE 19th Pacific Rim International Symposium on Dependable Computing  
We then introduce a crash recovery solution that is based on transactions, and detail the requirements for a system to implement this solution.  ...  We show that the likely presence of software bugs in such modules calls for the ability to recover from crashes, but that the current state of the art falls short of the desired behavior.  ...  This failure model covers more cases than fail-stop [22] , because it also allows for wild memory overwrites as long as the overwritten memory is either not accessed by another process or (for example  ... 
doi:10.1109/prdc.2013.56 dblp:conf/prdc/MoolenbroekAT13 fatcat:3iuw5sqsn5cl7ejznwbskjlqw4

Advances in the Design and Implementation of Group Communication Middleware [chapter]

Daniel Bünzli, Rachele Fuzzati, Sergio Mena, Uwe Nestmann, Olivier Rütti, André Schiper, Paweł T. Wojciechowski
2006 Lecture Notes in Computer Science  
The paper discusses the results obtained.  ...  The goal of the project was to improve the state of the art of group communication in several directions: protocol frameworks, group communication stacks, specification, verification and robustness.  ...  Specifications for the crash-recovery model have been proposed in the past, but they fail to capture the fundamental difference between the crash-stop and the crash-recovery model.  ... 
doi:10.1007/11808107_8 fatcat:zy6loymyyje5fbun5zhwghru7q

Recoverable Consensus in Shared Memory [article]

Wojciech Golab
2018 arXiv   pre-print
Herlihy's consensus hierarchy ranks the power of various synchronization primitives for solving consensus in a model where asynchronous processes communicate through shared memory and fail by halting.  ...  Several results are proved in this model: (i) We prove that any primitive at level two of Herlihy's hierarchy remains at level two if simultaneous crash-recovery failures are introduced.  ...  These results separate the model with independent crash-recovery failures both from the model with simultaneous crash-recovery failures, and the standard model with halting failures, in terms of the computability  ... 
arXiv:1804.10597v2 fatcat:5qpapv2wxrbplgfrq6may4zwl4

Safe termination detection in an asynchronous distributed system when processes may crash and recover

Neeraj Mittal, Kuppahalli L. Phaneesh, Felix C. Freiling
2009 Theoretical Computer Science  
It has been shown that the problem is impossible to solve under crash-recovery model in general.  ...  We investigate the termination detection problem in an asynchronous distributed system under crash-recovery model.  ...  In this case, the termination condition for a distributed computation in the crash-recovery model becomes equivalent to that in the failure-free model.  ... 
doi:10.1016/j.tcs.2008.10.011 fatcat:eje7ybgevvd6znfhju6kar4wg4

Safe Termination Detection in an Asynchronous Distributed System When Processes May Crash and Recover [chapter]

Neeraj Mittal, Kuppahalli L. Phaneesh, Felix C. Freiling
2006 Lecture Notes in Computer Science  
It has been shown that the problem is impossible to solve under crash-recovery model in general.  ...  We investigate the termination detection problem in an asynchronous distributed system under crash-recovery model.  ...  In this case, the termination condition for a distributed computation in the crash-recovery model becomes equivalent to that in the failure-free model.  ... 
doi:10.1007/11945529_10 fatcat:idukxiqsobba7mzoqyj6qvjuwe

Lessons from FTM: an experiment in design and implementation of a low-cost fault tolerant system

G. Muller, M. Banatre, N. Peyrouze, B. Rochat
1996 IEEE Transactions on Reliability  
We comment on the reasons for the evolution of our stable memory technology from hardware to software. Finally, we present a performance evaluation of the FTM prototype.  ...  These objectives were achieved using the Mach micro-kernel and a modular set of reliable servers which implement application checkpoints and provide continuous system functions despite machine crashes.  ...  Puaut and P.A. Lee from Newcastle University for their pertinent comments and patient reviews of early versions of this document. J.P.  ... 
doi:10.1109/24.510822 fatcat:ydeyudat6zbejkk4zijzdqu7za

On a Virtual Shared Memory Cluster System with Virtual Machines

Minakshi Tripathy, C.R. Tripathy
2011 International Journal of Computer and Electrical Engineering  
In this paper, an architecture with a load balancing model and a fault tolerant model for virtual shared memory clusters is proposed.  ...  The performance evaluation results show that the proposed system achieves significant speedup in terms of execution time and checkpoint time.  ...  This paper also proposes a checkpoint and recovery based fault tolerant model for virtual shared memory clusters. The checkpoints are stored in a distributed file system.  ... 
doi:10.7763/ijcee.2011.v3.416 fatcat:7uketa3gmbh4vlk5scd3ctajyy

The RAMCloud Storage System

John Ousterhout, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, Stephen Yang, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park (+1 others)
2015 ACM Transactions on Computer Systems  
This frees application developers from the need to manage a separate durable storage system, or to maintain consistency between in-memory and durable storage.  ...  The log-structured approach also simplifies crash recovery and utilizes DRAM twice as efficiently as traditional storage allocators such as malloc.  ...  We assume a fail-stop model for failures, in which the only way servers fail is by crashing. If a server has not crashed, then we assume that it is functioning correctly.  ... 
doi:10.1145/2806887 fatcat:fg3r5yahbjhxhcor6m2w2q6bxy

Memory Reclamation for Recoverable Mutual Exclusion [article]

Sahil Dhoked, Neeraj Mittal
2021 arXiv   pre-print
Our RMR and space complexities are applicable to both $CC$ and $DSM$ memory models.  ...  In this work, we present the first "general" recoverable algorithm for memory reclamation in the context of recoverable mutual exclusion.  ...  The two most common memory models used to analyze the performance of an RME algorithm are cache-coherent (CC) and distributed shared memory (DSM) models.  ... 
arXiv:2103.01538v1 fatcat:y6wwwn77wvf5te7spselovreym

Using model checking to find serious file system errors

Junfeng Yang, Paul Twohey, Dawson Engler, Madanlal Musuvathi
2006 ACM Transactions on Computer Systems  
For each file system, FiSC found demonstrable events leading to the unrecoverable destruction of metadata and entire directories, including the file system root directory "/".  ...  Model checking is a formal verification technique tuned for finding corner-case errors by comprehensively exploring the state spaces defined by a system.  ...  We are also grateful to Andrew Myers (our shepherd), Ken Ashcraft, Brian Gaeke, Lea Kissner, Ilya Shpitser, Xiaowei Yang, Monica Lam and the anonymous reviewers for their careful reading and valuable feedback  ... 
doi:10.1145/1189256.1189259 fatcat:zrhr7vghkzdxjmlym6ctyohuhy

Using Crash Hoare logic for certifying the FSCQ file system

Haogang Chen, Daniel Ziegler, Tej Chajed, Adam Chlipala, M. Frans Kaashoek, Nickolai Zeldovich
2015 Proceedings of the 25th Symposium on Operating Systems Principles - SOSP '15  
To state FSCQ's theorems, this paper introduces the Crash Hoare logic (CHL), which extends traditional Hoare logic with a crash condition, a recovery procedure, and logical address spaces for specifying  ...  CHL also reduces the proof effort for developers through proof automation. Using CHL, we developed, specified, and proved the correctness of the FSCQ file system.  ...  Acknowledgments Thanks to Nathan Beckmann, Butler Lampson, Robert Morris, and the IronClad team for insightful discussions and feedback.  ... 
doi:10.1145/2815400.2815402 dblp:conf/sosp/ChenZCCKZ15 fatcat:sjbqajcixzg2hgdcqhhetqqtx4
« Previous Showing results 1 — 15 out of 5,941 results