On Fractional Dynamic Faults with Threshold [chapter]

Stefan Dobrev, Rastislav Královič, Richard Královič, Nicola Santoro
2006 Lecture Notes in Computer Science  
Dynamic Faults In a message-passing distributed computing environment, entities communicate by sending messages to their neighbors in the underlying communication network. However, during transmission, messages might be lost. The presence of communication faults renders the solution of problems difficult if not impossible. In particular, in asynchronous settings, the mere possibility of faults renders unsolvable almost all non trivial tasks, even if the faults are localized to (i.e., restricted
more » ... to occur on the links of) a single entity [11] . Due to this ⋆ Partially supported by VEGA 1/3106/06, NSERC, and TECSIS Co. inherent difficulty connected with asynchrony, the focus is on synchronous environments, both from the point of view of theoretical investigation, and industrial application (e.g. communication protocols for wireless networks). Since synchrony provides a perfect omission detection mechanism [2], localized faults are easily dealt with in these systems; indeed, any number of faulty links can be tolerated provided they do not disconnect the network. The immediate question is then whether synchrony allows to tolerate also dynamic communication faults; that is, faults that are not restricted to a fixed (but a priori unknown) set of links, but can occur between any two neighbors [17] . These types of faults, also called mobile or ubiquitous, are clearly more difficult to handle. In this regard, the investigations have focused mostly on the basic problem of broadcast: an entity has some information that must communicate to all other entities in the network. Indeed, the ability or impossibility of performing this task has immediate consequence for many other tasks. Not surprisingly, a large research effort has been on the analysis of broadcasting in the presence of dynamic communication faults. Clearly no computation, including broadcast, is possible if the amount of faults that can occur per time unit and the modality of their occurrence is unrestricted. The research quest has thus been on determining under what conditions on the faults non-trivial computations can be performed in spite of those faults. Constructively, the effort is on designing protocols that can correctly solve a problem provided some restrictions on the occurrence of faults hold. A first large group of investigations have considered the so-called cumulative model; that is, there is a (known) limit L on the number 4 of messages that can be lost at each time unit. If the limit is less than the edge connectivity of the network, L < c(G), then broadcast can be achieved by simply flooding and repeating transmissions for an appropriate amount of time. The research has been on determining what is the smallest amount of time in general or for specific topologies [3-6, 8-10, 12, 14, 15], as well as on how to use broadcast for efficiently computing functions and achieving other tasks [7, 18, 19] . The advantage of the cumulative model is that solutions designed for it are L-tolerant; that is they tolerate up to L communication faults per time units. The disadvantage of this approach is that it neglects the fact that in real systems the number of lost messages is generally a function of the number of all message transitions. This feature leads to an anomaly of the cumulative model, where solutions that flood the network with large amounts of messages tend to work well, while their behavior in real faulty environments is often quite poor. A setting that takes into account the interplay between amount of transmissions and number of losses is the probabilistic model: there is no a priori upper bound on the total number of faults per time unit, but each transmission has a (known) probability p < 1 to fail. The investigations in this model have focused on designing broadcasting algorithms with low time complexity and high proba-4 since the faults are dynamic, no restriction is clearly posed on their location
doi:10.1007/11780823_16 fatcat:jkazylbatjbh3gdejapqmugudy