Comparing genomes to computer operating systems in terms of the topology and evolution of their regulatory control networks

K.-K. Yan, G. Fang, N. Bhardwaj, R. P. Alexander, M. Gerstein
2010 Proceedings of the National Academy of Sciences of the United States of America  
The genome has often been called the operating system (OS) for a living organism. A computer OS is described by a regulatory control network termed the call graph, which is analogous to the transcriptional regulatory network in a cell. To apply our firsthand knowledge of the architecture of software systems to understand cellular design principles, we present a comparison between the transcriptional regulatory network of a well-studied bacterium (Escherichia coli) and the call graph of a
more » ... al OS (Linux) in terms of topology and evolution. We show that both networks have a fundamentally hierarchical layout, but there is a key difference: The transcriptional regulatory network possesses a few global regulators at the top and many targets at the bottom; conversely, the call graph has many regulators controlling a small set of generic functions. This top-heavy organization leads to highly overlapping functional modules in the call graph, in contrast to the relatively independent modules in the regulatory network. We further develop a way to measure evolutionary rates comparably between the two networks and explain this difference in terms of network evolution. The process of biological evolution via random mutation and subsequent selection tightly constrains the evolution of regulatory network hubs. The call graph, however, exhibits rapid evolution of its highly connected generic components, made possible by designers' continual fine-tuning. These findings stem from the design principles of the two systems: robustness for biological systems and cost effectiveness (reuse) for software systems. systems biology | adaptive complex systems C omplex systems are characterized by interactions among huge numbers of heterogeneous constituents. In particular, many complex systems are adaptive, meaning the interconnections are shaped progressively by a changing environment. The driving forces of adaptation are common design principles such as the reduction of cost and the enhancement of system robustness (1). Optimal solutions are determined by trade-offs between conflicting principles and therefore vary from system to system. Over the past decade, the study of networks has emerged as an interdisciplinary research field aiming to discover the underlying principles of complex systems and to develop tools or algorithms for analyzing them. By capturing the interconnections between individual components, networks not only serve as backbones to study the emergent properties of complex systems, but they also provide an abstract framework that facilitates the crossdisciplinary comparison of different adaptive complex systems, ranging from biological systems to technological ones (2). Cross-disciplinary comparison between biological systems and commonplace systems such as organization hierarchies (3, 4) and engineering devices should be of particular interest to systems biologists. Despite tremendous advancement in high-throughput experiments and computational algorithms, the study of biological systems in general still suffers from limitations in accuracy and completeness of data. Insights gained from systems in which we have direct access and thorough understanding can leverage our knowledge to biological ones. Like biological systems, software systems such as a computer operating system (OS) are adaptive systems undergoing evolution. Whereas the evolution of biological systems is subject to natural selection, the evolution of software systems is under the constraints of hardware architecture and customer requirements. Since the pioneering work of Lehman (5), the evolutionary pressure on software has been studied among engineers. Interestingly enough, biological and software systems both execute information processing tasks. Whereas biological information processing is mediated by complex interactions between genes, proteins, and various small molecules, software systems exhibit a comparable level of complexity in the interconnections between functions. Understanding the structure and evolution of their underlying networks sheds light on the design principles of both natural and man-made information processing systems. The master control plan of a cell is its transcriptional regulatory network. The transcriptional regulatory network coordinates gene expression in response to environmental and intracellular signals, resulting in the execution of cellular processes such as cell divisions and metabolism. Understanding how cellular control processes are orchestrated by transcription factors (TFs) is a fundamental objective of systems biology (6-9), and therefore a great deal of effort has been focused on understanding the structure and evolution of transcriptional regulatory networks. Analogous to the transcriptional regulatory network in a cell, a computer OS consists of thousands of functions organized into a so-called call graph, which is a directed network whose nodes are functions with directed edges leading from a function to each other function it calls. Whereas the genome-wide transcriptional regulatory network and the call graph are static representations of all possible regulatory relationships and calls, both transcription regulation and function activation are dynamic. Different sets of transcription factors and target genes forming so-called functional modules (10) are activated at different times and in response to different environmental conditions. In the same way, complex OSs are organized into modules consisting of functions that are executed for various tasks. Here we perform a one-to-one comparison between the transcriptional regulatory network of Escherichia coli and the call graph of the Linux kernel, which are both canonical systems. E. coli is one of the most well-annotated model organisms.
doi:10.1073/pnas.0914771107 pmid:20439753 pmcid:PMC2889091 fatcat:cl7l3ij2ffberkblhqw2wkcbgu