67 Hits in 1.6 sec

Fast crash recovery in RAMCloud

Diego Ongaro, Stephen M. Rumble, Ryan Stutsman, John Ousterhout, Mendel Rosenblum
2011 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles - SOSP '11  
RAMCloud is a DRAM-based storage system that provides inexpensive durability and availability by recovering quickly after crashes, rather than storing replicas in DRAM.  ...  scale to recover quickly after crashes.  ...  Bigtable, like RAMCloud, implements fast crash recovery (during which data is unavailable) rather than online replication.  ... 
doi:10.1145/2043556.2043560 dblp:conf/sosp/OngaroRSOR11 fatcat:iglpm5pr55eajbwylbjhpebxe4

The RAMCloud Storage System

John Ousterhout, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, Stephen Yang, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park (+1 others)
2015 ACM Transactions on Computer Systems  
RAMCloud's crash recovery mechanism harnesses the resources of the entire cluster working concurrently so that recovery performance scales with cluster size. 7:2 J. Ousterhout et al.  ...  The log-structured approach also simplifies crash recovery and utilizes DRAM twice as efficiently as traditional storage allocators such as malloc.  ...  In addition, fast crash recovery requires fast failure detection, and the system must deal with secondary errors that occur during recovery.  ... 
doi:10.1145/2806887 fatcat:fg3r5yahbjhxhcor6m2w2q6bxy

Exploiting Commutativity For Practical Fast Replication [article]

Seo Jin Park, John Ousterhout
2017 arXiv   pre-print
In RAMCloud, CURP improved write latency by ~2x (13.8 us -> 7.3 us) and write throughput by 4x.  ...  This strategy allows most operations to complete in 1 RTT (the same as an unreplicated system). We implemented CURP in the Redis and RAMCloud storage systems.  ...  CURP can be used with RAMCloud without sacrificing its fast crash recovery [15] ).  ... 
arXiv:1710.09921v1 fatcat:ox5t6b2jmnfi3cy4mvczjwydt4

An Empirical Evaluation of How the Network Impacts the Performance and Energy Efficiency in RAMCloud

Yacine Taleb, Shadi Ibrahim, Gabriel Antoniu, Toni Cortes
2017 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)  
Through a study carried on RAMCloud, we focus on two settings: 1) clients are collocated within the same network as the storage servers (with Infiniband interconnects); 2) clients access the servers from  ...  In-memory storage systems emerged as a de-facto building block for today's large scale Web architectures and Big Data processing frameworks.  ...  This enables RAMCloud to harness large-scale to enable fast crash recovery.  ... 
doi:10.1109/ccgrid.2017.127 dblp:conf/ccgrid/TalebIAC17 fatcat:sqfmdzvgcncfffylubnc6zgviy


Prapti Panigrahi .
2018 International Journal of Research in Engineering and Technology  
In case of disasters, cloud storage can help in very quick recovery of data. Bandwidth usage can also be reduced, by sharing access-links instead of the complete files.  ...  This work aims to assist the reader in proper selection of architecture based on the types of operation the user of the architecture intends to have in his/her application.  ...  Each recovery master generates the hash table from log-structured data which is later merged. The key to fast recovery is utilizing the scale of the RAMCloud cluster.  ... 
doi:10.15623/ijret.2018.0710007 fatcat:q54suv7n6jcanieoqwotohdl7u

The case for RAMCloud

John Ousterhout, Guru Parulkar, Mendel Rosenblum, Stephen M. Rumble, Eric Stratmann, Ryan Stutsman, Parag Agrawal, David Erickson, Christos Kozyrakis, Jacob Leverich, David Mazières, Subhasish Mitra (+2 others)
2011 Communications of the ACM  
With scalable high-performance storage entirely in DRAM, RAMCloud will enable a new breed of data-intensive applications. by John ousterhout, Parag agrawal, david erickson, christos kozyrakis, Jacob leverich  ...  RAMCloud stores all of its information in the main key insights the web has driven development of new large-scale applications that have effectively scaled compute power and storage capacity but have not  ...  In any case, all these technologies are similar in that they provide fast access to small chunks of data.  ... 
doi:10.1145/1965724.1965751 fatcat:nmp2qlgjfvcivhx3fwegyegvuq

Taming uncertainty in distributed systems with help from the network

Joshua B. Leners, Trinabh Gupta, Marcos K. Aguilera, Michael Walfish
2015 Proceedings of the Tenth European Conference on Computer Systems - EuroSys '15  
Network and process failures cause complexity in distributed applications.  ...  The research was supported in part by NSF grants CNS-1055057, CNS-1040083, and CCF-1048269.  ...  Specifically, RAMCloud detects failures using a short timeout of hundreds of milliseconds; if the coordinator times out on a master, the coordinator starts data recovery, which is very fast.  ... 
doi:10.1145/2741948.2741976 dblp:conf/eurosys/LenersGAW15 fatcat:5enktayvpvgkvddmvwmrjm24dm

Replication-Based Fault-Tolerance for Large-Scale Graph Processing

Peng Wang, Kaiyuan Zhang, Rong Chen, Haibo Chen, Haibing Guan
2014 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks  
This paper observes that the vertex replicas created for distributed graph computation can be naturally extended for fast in-memory recovery of graph states.  ...  in-memory reconstruction of failed vertices from replicas in other machines.  ...  This work is supported in part by Doctoral  ... 
doi:10.1109/dsn.2014.58 dblp:conf/dsn/WangZCCG14 fatcat:vfuicg3rqrf3lbivizr52ggwr4

Stateless Network Functions

Murad Kablan, Blake Caldwell, Richard Han, Hani Jamjoom, Eric Keller
2015 Proceedings of the 2015 ACM SIGCOMM Workshop on Hot Topics in Middleboxes and Network Function Virtualization - HotMiddlebox '15  
In this paper, we propose that network functions should be similarly redesigned to be stateless.  ...  Our Click-based prototype integrates with RAMCloud; using NAT as an example network function, we demonstrate that we are able to create stateless network functions that maintain the desired performance  ...  ACKNOWLEDGEMENTS This work was funded in part by the following grants: NSF NeTS 1320389 and NSF XPS 1337399.  ... 
doi:10.1145/2785989.2785993 dblp:conf/sigcomm/KablanCHJK15 fatcat:grtdkb4xezemlbjymjjifu6kdm

DXRAM's Fault-Tolerance Mechanisms Meet High Speed I/O Devices [article]

Kevin Beineke and Stefan Nothaas and Michael Schoettner
2018 arXiv   pre-print
But, when storing the data in RAM on thousands of servers one has to consider server failures. Only a few in-memory key-value stores provide automatic online recovery of failed servers.  ...  The most prominent example of these systems is RAMCloud. Another system with sophisticated fault-tolerance mechanisms is DXRAM which is optimized for small data objects.  ...  A fast reorganization is important to keep a constant write throughput (provide enough free space for writes) and to allow a fast crash recovery (less invalid/outdated objects to process).  ... 
arXiv:1807.03562v2 fatcat:p4yobou5vjgqrn4lvdumamuelq

Assise: Performance and Availability via NVM Colocation in a Distributed File System [article]

Thomas E. Anderson, Marco Canini, Jongyul Kim, Dejan Kostić, Youngjin Kwon, Simon Peter, Waleed Reda, Henry N. Schuh, Emmett Witchel
2020 arXiv   pre-print
To demonstrate this, we built the Assise distributed file system, based on a persistent, replicated coherence protocol for managing a set of server-colocated PMMs as a fast, crash-recoverable cache between  ...  Fail-over and Recovery Assise caches file system state with persistence in local NVM, which it can use for fast recovery. Assise optimizes recovery performance according to crash prevalence.  ...  RAMcloud requires a full-bisection bandwidth network for fast recovery. Assise leverages colocated NVM for recovery and does not require full-bisection bandwidth or asynchronous backup storage.  ... 
arXiv:1910.05106v2 fatcat:3sjpue3tqzd3haqnh4ka72fezi

FluidMem: Memory as a Service for the Datacenter [article]

Blake Caldwell, Youngbin Im, Sangtae Ha, Richard Han, Eric Keller
2017 arXiv   pre-print
In this paper, we present FluidMem, a complete system to realize disaggregated memory in the datacenter.  ...  Disaggregating resources in data centers is an emerging trend.  ...  As an example, RAMCloud provides crash-recovery, tolerating node failures without loss of availability to the data store.  ... 
arXiv:1707.07780v1 fatcat:thnnbfklg5bmxgnwg4ngtddoly


Hoang Tam Vo, Sheng Wang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi
2012 Proceedings of the VLDB Endowment  
In this paper, we introduce LogBase -a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery.  ...  Writeahead-logging is a common approach for providing recovery capability while improving performance in most storage systems.  ...  Acknowledgments This work was in part supported by the Singapore MOE Grant No. R252-000-454-112.  ... 
doi:10.14778/2336664.2336673 fatcat:afskwwel3zb77hzsxqedjwtmay

LogBase: A Scalable Log-structured Database System in the Cloud [article]

Hoang Tam Vo, Sheng Wang, Divyakant Agrawal, Gang Chen, Beng Chin Ooi
2012 arXiv   pre-print
In this paper, we introduce LogBase - a scalable log-structured database system that adopts log-only storage for removing the write bottleneck and supporting fast system recovery.  ...  Write-ahead-logging is a common approach for providing recovery capability while improving performance in most storage systems.  ...  Acknowledgments This work was in part supported by the Singapore MOE Grant No. R252-000-454-112.  ... 
arXiv:1207.0140v1 fatcat:ek6r2lr36bfg3hhg7dp6pqxhma

Stretching Multi-Ring Paxos [article]

Samuel Benz, Leandro Pacheco de Sousa, Fernando Pedone
2015 arXiv   pre-print
., independent Paxos instances), a large number of replicas in a ring, and a global deployment.  ...  We also report on the performance of recovery under peak load and present two novel extensions to boost Multi-Ring Paxos's performance.  ...  To recover the data fast RAMCloud relies on the collective force of thousands of servers.  ... 
arXiv:1504.04942v1 fatcat:ipncz5mh7jb3flmqfg3deghvyu
« Previous Showing results 1 — 15 out of 67 results