Fault tolerance via idempotence

Ganesan Ramalingam, Kapil Vaswani
2013 SIGPLAN notices  
Building distributed services and applications is challenging due to the pitfalls of distribution such as process and communication failures. A natural solution to these problems is to detect potential failures, and retry the failed computation and/or resend messages. Ensuring correctness in such an environment requires distributed services and applications to be idempotent. In this paper, we study the inter-related aspects of process failures, duplicate messages, and idempotence. We first
more » ... duce a simple core language (based on λ-calculus) inspired by modern distributed computing platforms. This language formalizes the notions of a service, duplicate requests, process failures, data partitioning, and local atomic transactions that are restricted to a single store. We then formalize a desired (generic) correctness criterion for applications written in this language, consisting of idempotence (which captures the desired safety properties) and failure-freedom (which captures the desired progress properties). We then propose language support in the form of a monad that automatically ensures failfree idempotence. A key characteristic of our implementation is that it is decentralized and does not require distributed coordination. We show that the language support can be enriched with other useful constructs, such as compensations, while retaining the coordination-free decentralized nature of the implementation. We have implemented the idempotence monad (and its variants) in F# and C# and used our implementation to build realistic applications on Windows Azure. We find that the monad has low runtime overheads and leads to more declarative applications.
doi:10.1145/2480359.2429100 fatcat:kfe7mv3pfnfgziyhnwvj7o4iby