Structural robustness of mammalian transcription factor networks reveals plasticity across development
Network biology aims to understand cell behavior through the analysis of underlying complex biomolecular networks. Inference of condition-specific interaction networks from epigenomic data enables the characterization of the structural plasticity that regulatory networks can acquire in different tissues of the same organism. From this perspective, uncovering specific patterns of variation by comparing network structure among tissues could provide insights into systems-level mechanisms
... cell behavior. Following this idea, here we propose an empirical framework to analyze mammalian tissue-specific networks, focusing on characterizing and contrasting their structure and behavior in response to perturbations. We structurally represent the state of the cell/tissue by condition specific transcription factor networks generated using chromatin accessibility data, and we profile their systems behavior in terms of the structural robustness against random and directed perturbations. Using this framework, we unveil the structural heterogeneity existing among tissues at different levels of differentiation. We uncover a novel and conserved systems property of regulatory networks underlying embryonic stem cells (ESCs): in contrast to terminally differentiated tissues, the promiscuous regulatory connectivity of ESCs produces a globally homogeneous network resulting in increased structural robustness. Possible biological consequences of this property are discussed. A central tenet of systems biology is that cell behavior can be understood in terms of the structure and dynamics of underlying complex molecular networks. 1, 2 Under such paradigm, major efforts have been made to systematically map and characterize the properties of molecular networks at different levels of organization. Reference protein-protein interaction, metabolic, and transcriptional regulatory networks have been constructed and are being frequently updated in several model organisms.    Initial efforts have largely focused on providing an organismal reference for the global network structure. Network theory provides methods for the systemic description of a network's structure and its dynamics. 6-8 One of the . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/209528 doi: bioRxiv preprint first posted online Oct. 26, 2017; major results of network biology is the discovery within the reference networks of apparently universal organizational properties across the different types of complex biological networks. 2 While the characterization of reference real-world complex networks has uncovered structural similarities among complex networks that are believed to underly their systemic properties, 2, 6 much less is known about the degree of structural heterogeneity of condition-specific biomolecular networks, and how patterns of variation promote or constrain systems-level behaviors. In cell biology, one intriguing hypothesis is that network heterogeneity emanating from the normal process of development might result in differential behaviors underlying the contrasting cellular phenotypes. In line with this idea, the field of network biology has recently started shifting towards the characterization of condition-specific networks and analysis of circuitry dynamics, 9, 10 presumably due to the increasing availability of functional genomics and epigenomics assays. For example, Neph and collaborators put forward a methodology to assemble tissue-specific transcription factor networks with the aid of available chromatin accessibility profiles from multicellular genomes. 9,    The proposed networks connect each transcription factor (TF) to its incoming TF regulators, thus representing the regulatory structure of the cell in terms of the main regulators (e.g. TFs) and the mutual regulatory interactions among them. More specifically, using digital genomic footprinting (DGF) analysis, TF-TF interactions are established by integrating TF motif matching with DNase I hypersensitive sites (DHS) and high-resolution genomic footprints. Tissue-specificity comes from the condition-specific accessibility of cis-regulatory regions upstream a TF. Using this approach, tissue-specific TF networks have been constructed for model organisms and for human. 9, 14 Given that the observed TF interactions reflect tissue-specific activity states, we reasoned that the structure and relative systems-level behavior displayed by these networks could provide insights into the biology and differentiation potential of the corresponding tissues. In order to begin understanding the link between network structure heterogeneity, behavior, and biological phenotypes, here we put forward a computational framework to characterize the structural properties of mammalian tissue-specific TF networks and their behavior, emphasizing the degree of deviation from theoretical expectations. We focus on one systemslevel behavior which is informative of the latter: the robustness of the networks to structural perturbations. To this end, we profiled the structural properties of a broad set of TF networks in mouse and human, and we compared the observed behavior across tissues and with expectations from theoretical models. Interestingly, we discovered that embryonic stem cells (ESCs) posses a distinctive regulatory structure: its higher structural similarity to the topological properties expected from a homogeneous network theoretical model endows them with a remarkable resilient behavior. We discuss potential biological implications. Results Analysis framework Networks provide a theoretical framework that allows a convenient conceptual representation of interrelations among a large number of elements. 6 Furthermore, it is usually possible to frame questions about the behavior of the underlying real system by applying well-established measures and analyses over the network representing the empirical data. 15 Here we focus on tissue-specific networks where nodes represent TFs and links inter-regulatory interactions, and propose an analysis framework with the goal of characterizing the commonalities and differences in behavior against structural perturbations across tissues. We ask whether some tissues display extreme behaviors, and whether or not such deviations an extreme behaviors highlight aspects of the underlying biology. We hypothesize that the differences to be discovered underlie aspects of the observed 2/20 . CC-BY-NC-ND 4.0 International license peer-reviewed) is the author/funder. It is made available under a The copyright holder for this preprint (which was not . http://dx.doi.org/10.1101/209528 doi: bioRxiv preprint first posted online Oct. 26, 2017; Overall, all networks were found to be highly tolerant to random errors. In both mouse and human tissues, the size of the giant component (S f /S 0 ) decreases linearly with f without abrupt transitions (Figure 2 a and c, dashed lines) . The efficiency of the networks (E f /E 0 ) also shows consistent behavior across all human and mouse tissues: it shows minimal decrease for a large proportion of f until it falls abruptly around f = 0.8 ( Figure 2 b and d, dashed lines) . The observed robustness to random failures is consistent with predictions from percolation theory in complex random networks, as it is less likely to perturb key, highly connected components in networks with long-tail degree distribution. 6, 16 Also consistent with theory, TF networks were correspond to the ESCs behavior and blue lines to other cell types. a) Human giant component size decrease. b) Human efficiency decrease. c) Mouse giant component size decrease. d)Mouse efficiency decrease. Error-attack deviation measured for each TF network. Human e) giant component and f) efficiency. Mouse h) giant component and i) efficiency. g) Human and j) mouse ∆ ea distributions, red dots correspond to ESC measurements. its structural robustness. The random forest model also shows high accuracy, with a cross validation mean square error of 0.00025. The most informative topological features are degree entropy and efficiency. Comparing between the models, we see that a simple linear regression has a higher predictive accuracy than the random forest model including 11 topological features. This is reinforced by the fact that the most influential feature of the random forest model is degree entropy, a feature correlated with a network similarity to a homogeneous network, and discovered to characterize ESCs networks. Thus, dissimilarity to E-R model network D E−R , a measure quantifying the degree of homogeneity of a real-world network, is predictive of its structural robustness.