Reliable and total order broadcast in the crash-recovery model

Romain Boichat, Rachid Guerraoui
2005 Journal of Parallel and Distributed Computing  
This paper addresses the problem of broadcasting messages in a reliable and totally ordered manner when processes and channels may crash and recover, or crash and never recover. We present a suite of specifications of reliable and total order broadcast primitives and we describe algorithms that implement those specifications. Our approach is modular and incremental. It is modular in the sense that the properties of broadcast primitives are first given separately and then composed: this provides
more » ... a comprehensive design space for broadcast semantics. It is incremental in the sense that a broadcast algorithm implementing a given specification is obtained by transforming an algorithm that implements a weaker specification: this gives an automatic way to improve the resilience of broadcast primitives. We derive specific reliable and total order broadcast algorithms and we discuss their performance and optimality. Phone/Fax: +41 21 693 6702/7570 2 We focus here on deterministic algorithms, unlike [4] for instance which considers randomised algorithms that offer probabilistic guarantees. 1 and total order broadcast primitives assuming a practical asynchronous crash-recovery model: processes and channels may crash and recover or crash and never recover. Motivation. Given their wide applicability, broadcast primitives have been extensively studied for over a decade. In particular, many papers have been published on algorithms that implement reliable and total order broadcast primitives in a crash-stop system model [10, 14, 3, 13, 5, 7] . According to this model, channels are reliable and processes execute the algorithm assigned to them, unless they crash, in which case they simply halt their activities. Processes that do not crash are called correct processes. The simplicity of this model was a key to studying and comparing many broadcast algorithms, and also devising rigorous proofs for their correctness. The practicality of the crash-stop system model is however questionable. The assumption that some processes never crash, and that those that crash never recover, is indeed simple but is quite unrealistic. In practice, processes that crash eventually recover and resume their activities. In the meantime, i.e., between the crash and the recovery events, the messages sent to a crashed process are lost. After a crash, a process typically loses the content of its volatile memory and only preserves the content of its stable storage. Devising algorithms for the crash-recovery model is more tricky than for the crash-stop model, precisely because of the need of careful use of stable storage. Processes should log in stable storage crucial information that will help them recover in a consistent state, but performing a forced log 3 is expensive and should be avoided as much as possible. In summary, there is a significant literature about crash-stop resilient broadcast algorithms, but these do not fit a more realistic crash-recovery model which introduces a non-trivial complexity through the use of stable storage. The motivation of our work is precisely to devise crash-recovery resilient broadcast primitives. Specifications and implementations. The specification of a reliable broadcast primitive is composed of three kinds of properties [15]: a validity property (V) that ensures the liveness of the broadcast, an agreement property (A) which ensures consensus on message delivery, and an integrity property (I) that prevents the absence of spurious messages and multiple deliveries. The specification of a total order broadcast primitive contains an additional total order (TO) type of property [15] . Devising crash-recovery resilient broadcast primitives goes first through providing meaningful variants of those properties in a crash-recovery model. Indeed, the possibility for the processes 3 A synchronous write on disk.
doi:10.1016/j.jpdc.2004.10.008 fatcat:dwjr54hhb5g6pfqezb2looqxta