Distributed redundancy and robustness in complex systems

Martin Randles, David Lamb, E. Odat, A. Taleb-Bendiab
2011 Journal of computer and system sciences (Print)  
The uptake and increasing prevalence of Web 2.0 applications, promoting new largescale and complex systems such as Cloud computing and the emerging Internet of Services/Things, requires tools and techniques to analyse and model methods to ensure the robustness of these new systems. This paper reports on assessing and improving complex system resilience using distributed redundancy, termed degeneracy in biological systems, to endow large-scale complicated computer systems with the same
more » ... that emerges in complex biological and natural systems. However, in order to promote an evolutionary approach, through emergent self-organisation, it is necessary to specify the systems in an 'open-ended' manner where not all states of the system are prescribed at designtime. In particular an observer system is used to select robust topologies, within system components, based on a measurement of the first non-zero Eigen value in the Laplacian spectrum of the components' network graphs; also known as the algebraic connectivity. It is shown, through experimentation on a simulation, that increasing the average algebraic connectivity across the components, in a network, leads to an increase in the variety of individual components termed distributed redundancy; the capacity for structurally distinct components to perform an identical function in a particular context. The results are applied to a specific application where active clustering of like services is used to aid load balancing in a highly distributed network. Using the described procedure is shown to improve performance and distribute redundancy. Robustness is observed in many biological systems; it is increasingly accepted as a fundamental property of complex evolvable systems [2, 3] , which for instance enables the persistence of a given function in spite of external or internal perturbations. Many examples of robustness can be found in well studied natural system models, such as ant foraging, herding, flocking, schooling [4], or regulatory networks within cellular and multi-cellular individual organisms [2] . Distributed redundancy (referred to as variety in cybernetics literature or degeneracy in biological systems [5] ) has been observed to be ubiquitous in these and many other natural/biological systems, contributing significantly to their resilience [6]; for instance, in genetic code, many varieties of nucleotide sequences encode the same polypeptide or there are many different ways in which communication may be achieved between animals (even within human language) [5] . Distributed redundancy, as a global system emerging property, or phenomenon, arises out of the individual components interactions and distributed connectivity. It is important to note the difference between regular redundancy and distributed redundancy; the distinction is clearly seen in the comparison between design and selection: For an engineered system, redundancy is built into the design to provide fail-safe operation, unplanned interactions are ruled out, specific functions are aligned with particular components and no adaptation is expected in response to failure. A biological system, on the other hand, has no design, it is evolutionary in nature, any part may change or mutate to contribute to a function, there is no fixed assignment of function to components and interactions become very complex. Thus redundancy, in engineered systems, simply consists of providing spare components to identically replace failed or failing components in the system. Biological systems, in contrast, adapt or make different uses of existing components to replace failed or failing system parts. For instance, in the previously given example of communication, in a biological system if communication through speech (say) becomes impossible then other system attributes may be utilised, to accomplish the same outcome, such as sign language, for example. Alternatively if communication in an engineered system through radio (say) becomes impossible then a faulty component is diagnosed and replaced with a working identical replacement. It is thus clear that far from distributed redundancy being selected by evolution, it is rather a necessary condition for effective adaptation: Distributed redundancy refers to different elements facilitating the same outcomes, whereas regular redundancy refers to the function of identical elements. Contributions Whilst the concepts of robustness, distributed connectivity and redundancy might be intuitive, and many classical approaches to robust design already exist, the emergence of robust structure has inspired many computational models and applications in for instance: P2P and self-organising networks management, grid resource optimisation and scheduling and swarm based service compositions [6]; yet it is not fully understood how to characterise or engineer robust structure as a general emergent feature in such computational systems. Such models, as have been proposed, are generally not appropriate for large-scale decentralised dynamic systems. In addition, there is little engineering understanding as to how to characterise, analyse or measure robustness in these largescale decentralised dynamic systems. As previously stated, it is thought that robustness is a feature of evolving complex and dynamic systems [2] with engineered robustness facilitating evolution and evolution favouring robust traits. Thus there are structural requirements for systems to be evolvable involving the capacity to produce more robust components. This, allied with the hierarchical modular nature of the structures, suggests a nested bow tie or hour glass structure [7] may best capture the dynamics, where various input and output modules are connected through a conserved core with extensive system control; particularly as this architecture has emerged as an underlying feature of the World Wide Web [8] . The heterogeneous nature of the environments and participants, the specialised computational powers required to drive the processes within the components, whilst handling the vast amounts of resultant data, and the need for extensive communication between components, to facilitate robustness through emergent organisation, means that the modelling environment is required to capture both the dynamically changing nature of the systems at all its levels (from global down to local) and the static aspect of the data set at any discrete time point. It is proposed to investigate, in this paper, how to move towards bringing aspects of biological robustness (such as distributed redundancy) to computer systems: A technique to increase robustness by demonstrating a measurable increase in distributed redundancy is discussed that uses system rewiring to promote increased connectivity. An enhanced network-rewiring algorithm mediated by algebraic connectivity analysis is developed, which provides a measured increase in the identified robustness metric; distributed redundancy. This is used, for instance, to steer a given network rewiring (generation) process towards robust topologies/configurations. The dynamic nature of current systems means that network rewiring is a frequent occurrence in these systems. For instance load balancing on complex network is often performed based on the creation and deletion of network connections to optimise the placement of work over the network. In particular active clustering is a recently investigated technique whereby like services are rewired together to provide an easy distribution of the load on heterogeneous systems. This active clustering load-balancing procedure [9] is used in this paper to test the application of the discussed techniques, whereby clustering is mediated by algebraic connectivity: The clustering only proceeds if increased algebraic connectivity (shown in this paper to map to increased distributed redundancy) of the network is observed.
doi:10.1016/j.jcss.2010.01.008 fatcat:7e6ureui7vc35o5ajkhaqi2l3q