Fault-tolerant algorithms for tick-generation in asynchronous logic

Danny Dolev, Matthias Függer, Ulrich Schmid, Christoph Lenzen
2014 Journal of the ACM  
Today's hardware technology presents a new challenge in designing robust systems. Deep submicron VLSI technology introduces transient and permanent faults that were never considered in low-level system designs in the past. Still, robustness of that part of the system is crucial and needs to be guaranteed for any successful product. Distributed systems, on the other hand, have been dealing with similar issues for decades. However, neither the basic abstractions nor the complexity of contemporary
more » ... ity of contemporary fault-tolerant distributed algorithms match the peculiarities of hardware implementations. This article is intended to be part of an attempt striving to bridge over this gap between theory and practice for the clock synchronization problem. Solving this task sufficiently well will allow to build an ultrarobust high-precision clocking system for hardware designs like systems-on-chips in critical applications. As our first building block, we describe and prove correct a novel distributed, Byzantine fault-tolerant, probabilistically self-stabilizing pulse synchronization protocol, called FATAL, that can be implemented using standard asynchronous digital logic: Correct FATAL nodes are guaranteed to generate pulses (i.e., unnumbered clock ticks) in a synchronized way, despite a certain fraction of nodes being faulty. FATAL uses randomization only during stabilization and, despite the strict limitations introduced by hardware designs, offers optimal resilience and smaller complexity than all existing protocols. Finally, we show how to leverage FATAL to efficiently generate synchronized, self-stabilizing, high-frequency clocks. ACM Reference Format: Dolev, D., Függer, M., Schmid, U., and Lenzen, C. 2014. Fault-tolerant algorithms for tick-Generation in asynchronous logic: Robust pulse generation. D. Dolev et al. 7 In sharp contrast to classic distributed computing models, there is no computationally complex discrete zero-time state-transition here.
doi:10.1145/2560561 fatcat:ckb5g6w2ybd4rp7u3rrzqt45qi