Stable Gene Regulatory Network Modeling From Steady-State Data

Joy Larvie, Mohammad Sefidmazgi, Abdollah Homaifar, Scott Harrison, Ali Karimoddini, Anthony Guiseppi-Elie
2016 Bioengineering  
Gene regulatory networks represent an abstract mapping of gene regulations in living cells. They aim to capture dependencies among molecular entities such as transcription factors, proteins and metabolites. In most applications, the regulatory network structure is unknown, and has to be reverse engineered from experimental data consisting of expression levels of the genes usually measured as messenger RNA concentrations in microarray experiments. Steady-state gene expression data are obtained
more » ... om measurements of the variations in expression activity following the application of small perturbations to equilibrium states in genetic perturbation experiments. In this paper, the least absolute shrinkage and selection operator-vector autoregressive (LASSO-VAR) originally proposed for the analysis of economic time series data is adapted to include a stability constraint for the recovery of a sparse and stable regulatory network that describes data obtained from noisy perturbation experiments. The approach is applied to real experimental data obtained for the SOS pathway in Escherichia coli and the cell cycle pathway for yeast Saccharomyces cerevisiae. Significant features of this method are the ability to recover networks without inputting prior knowledge of the network topology, and the ability to be efficiently applied to large scale networks due to the convex nature of the method. Bioengineering 2016, 3, 12 2 of 15 information on interaction directions, and temporal data that allows for the investigation of temporal patterns in biological networks [8, 9] . Owing to their inherent ability to encapsulate the high dimensional data of biological processes and pathways, networks have become an important tool in functional genomics [10, 11] . Researchers refer to any such network that provides a system level interaction among genes as a gene regulatory network (GRN) [12, 13] . GRNs are usually represented by directed graphs with nodes as genes, and edges depicting either an inhibition (negative regulation) or an activation (positive regulation) imposed by a gene over another through the production of a protein [14, 15] . The process of identifying genetic interactions from measured gene expression data is referred to as reverse engineering or network inference or recovery [7] . Inferring the topology of GRNs and isolating functional subnetworks are computationally challenging tasks in contemporary functional genomics, and these efforts are valuable for advancing scientific insight and for capitalizing on the time and costs associated with experimental data [16] [17] [18] [19] . GRNs typically contain information about the pathway to which a gene belongs and the genes it interacts with [16] , and this helps to reveal potential pathway initiators and drug targets [8] . Further analysis, to map interactions among phenotypic and genotypic characteristics, can provide a framework for the identification of biomarkers for medical diagnosis and prognosis [20, 21] . A plethora of modeling approaches such as co-expression clustering [22] , Boolean network [23,24], Bayesian network [25] and ordinary differential equation (ODE) [8] models have been proposed for recovering genetic networks. Cluster analysis and the sequential search for patterns of gene expression related with some pathological state of interest usually provide only indirect information about the structure of the network [7]. Alternatively, grouping of co-expressed genes may be achieved using information-theoretic methods. Both approaches, however, lack causality [9] . Causality may be recovered through Bayesian networks which can handle directed graphs [9, 26] . However, Bayesian networks typically do not accommodate cycles, and, hence, are unable to handle feedback motifs that are common in gene regulatory networks [26] . Causality and feedback motifs, however, are no longer a problem when the network is modeled as a set of differential equations [26] . Excellent as they are at modeling causality and feedback motifs, differential equations are only suitable for small-scale networks [9] . These existing techniques, however, rely heavily on temporal expression data which can be very difficult to acquire, and also require high computational effort [8, 26] . Major considerations of sparsity, stability and causality must be captured in the biological network recovery process [2] . In this paper, the least absolute shrinkage and selection operator-vector autoregressive (LASSO-VAR) model, originally proposed for the analysis of economic time series data in [27] , is adapted to include a stability constraint defined and used by [26] for the recovery of sparse and stable regulatory networks that describe steady-state data obtained from noisy perturbation experiments. The fact that LASSO-VAR is a vector autoregressive process implies that Granger causality can be inferred. The technique only requires one tuning parameter, which works to penalize non-sparse networks. The selection of this parameter is based on its mean square forecast error. The identification algorithm proposed is applicable for the identification of regulatory roles of individual genes and control genes in the network. It is also applicable for identifying genes that directly impact the bioactivity of a compound in the cell. The approach is applied to real experimental data obtained for the SOS pathway in Escherichia coli and the cell cycle pathway for yeast Saccharomyces cerevisiae. The significant features of this method are the ability to recover networks without a priori knowledge of the network topology, and to be efficiently applied to large scale networks due to the convex nature of the method. Methodology This section introduces the stable LASSO-VAR, the identification technique being adapted for reverse engineering gene regulatory networks from steady-state data [28] . In its original form, the LASSO-VAR technique described in [27] finds applications in the analysis and prediction of economic
doi:10.3390/bioengineering3020012 pmid:28952574 pmcid:PMC5597136 fatcat:6ybul52o2jeunagb4r2cdrf4i4