On fractional dynamic faults with thresholds

Stefan Dobrev, Rastislav Královič, Richard Královič, Nicola Santoro
2008 Theoretical Computer Science  
Unlike localized communication failures that occur on a fixed (although a priori unknown) set of links, dynamic faults can occur on any link. Known also as mobile or ubiquitous faults, their presence makes many tasks difficult, if not impossible to solve, even in synchronous systems. In this paper, we introduce a new model for dynamic faults in synchronous distributed systems. This model includes as special cases the existing settings studied in the literature. We focus on the hardest setting
more » ... this model, called the simple threshold, where to be guaranteed that at least one message is delivered in a time step, the total number of transmitted messages in that time step must reach a threshold T ≤ c(G), where c(G) is the edge connectivity of the network. We investigate the problem of broadcasting under this model for the worst threshold T = c(G) in several classes of graphs, as well as in arbitrary networks. We design solution protocols, proving that broadcasting is possible even in this harsh environment. We analyze the time costs, showing that broadcasts can be completed in (low) polynomial time for several networks including rings (with or without knowledge of n), complete graphs (with or without a chordal sense of direction), hypercubes (with or without orientation), and constant-degree networks (with or without full topological knowledge). Clearly no computation is possible if the number of faults that can occur per time unit and the modality of their occurrence is unrestricted. The research quest has thus been fcused on determining under what conditions on the faults non-trivial computations can be performed in spite of those faults. Constructively, the effort is on designing protocols that can correctly solve a problem provided some restrictions on the occurrence of faults hold. The approaches to describe the restrictions needed can be broadly divided into probabilistic and deterministic types. In the probabilistic model there is no a priori upper bound on the total number of faults per time unit, but each transmission has a (known) probability p < 1 of failing. The investigations in this model have focused on designing broadcasting algorithms with low time complexity and high probability of success [2, 18] . The drawback of this model is that the solutions derived for it have no deterministic guarantee of correctness. In our work, we follow the deterministic approach in which the worst combination of faults satisfying given restrictions is studied. The most basic model of deterministic faults is the model of static (or localized) faults, in which faults can occur only on a fixed (but a priori unknown) set of links [1, 14] . This restriction is well suited for modeling permanent faults, but is inappropriate for dealing with transient faults, which are very common in practice. Indeed, most of the errors occurring in a network, from a packet loss in transmission medium to links turned off during network reconfiguration, can be viewed as transient failures: they are repaired after some time but their location can be arbitrary. Hence, a natural extension in the modeling of network failures is to bound not the location but the number of faults. In this regard, the investigations have focused mostly on the basic problem of broadcasting: an entity has some information that must communicate to all other entities in the network. Indeed, the ability or impossibility of performing this task has immediate consequences for many other tasks. A first large group of investigations have considered the so-called cumulative model; that is, there is a (known) limit L on the number of messages that can be lost at each time unit. If the limit is less than the edge connectivity of the network, L < c(G), then a broadcast can be achieved by simply flooding and repeating transmissions for an appropriate amount of time. The research has been on determining what is the smallest amount of time in general, or for specific topologies [3] [4] [5] [6] [8] [9] [10] 13, 16, 17] , as well as on how to use broadcasting for efficiently computing functions and achieving other tasks [7, 19, 20] . The advantage of the cumulative model is that solutions designed for it are L-tolerant; that is they tolerate up to L communication faults per time unit. The disadvantage of this approach is that it neglects the fact that in real systems the number of lost messages is generally a function of the number of all message transitions. This feature leads to an anomaly of the cumulative model, where solutions that flood the network with large amounts of messages tend to work well, while their behavior in real faulty environments is often quite poor. In order to eliminate this unwanted feature from the model, the so called fractional model has been introduced in [15]. This deterministic setting explicitly takes into account the interaction between the number of faults and the number of messages, bounding the number of faults that can occur at time t not by a fixed constant but rather by a linear fraction α m t of the total number m t of messages sent at time t, for some (fixed, known) constant 0 ≤ α < 1. The advantage of the fractional model is that solutions designed for it tolerate the loss of up to a fraction of all transmitted messages. The anomaly of the fractional model is that, in this setting, transmitting a single message per communication round ensures its delivery; thus, the model leads to very counterintuitive algorithms which do not behave well in real faulty environments. Summarizing, to obtain optimal solutions, message redundancy must be avoided in the fractional model, while massive redundancy of messages must be used in the cumulative model; in real systems, both solutions might not fare well. In many ways, the two models are opposite extremes. The lesson to be learned from their anomalies is that on one hand there is need to use redundant communication, but on the other hand brute force algorithms based on repeatedly flooding the network do not necessarily solve the problem. In this paper, we propose a deterministic model that combines the cumulative and fractional models in a way that might better reflect reality. This model is actually more general, in that it includes those models as particular, extreme cases. It also defines a spectrum of settings that avoid the anomalies of both extreme cases. Fractional threshold and broadcast The failure model we consider, and that we shall call the fractional dynamic faults with threshold or simply fractional threshold model, is a combination of the fractional model with the cumulative model. Both fractional and cumulative models can be described as a game between the algorithm and an adversary: in a time step t, the algorithm
doi:10.1016/j.tcs.2008.02.008 fatcat:gnfuzwc74bf6rogdos5xf3xxpa