Hidden Markov induced Dynamic Bayesian Network for recovering time evolving gene regulatory networks

Shijia Zhu, Yadong Wang
2015 Scientific Reports  
Dynamic Bayesian Networks (DBN) have been widely used to recover gene regulatory relationships from time-series data in computational systems biology. Its standard assumption is 'stationarity', and therefore, several research efforts have been recently proposed to relax this restriction. However, those methods suffer from three challenges: long running time, low accuracy and reliance on parameter settings. To address these problems, we propose a novel non-stationary DBN model by extending each
more » ... idden node of Hidden Markov Model into a DBN (called HMDBN), which properly handles the underlying time-evolving networks. Correspondingly, an improved structural EM algorithm is proposed to learn the HMDBN. It dramatically reduces searching space, thereby substantially improving computational efficiency. Additionally, we derived a novel generalized Bayesian Information Criterion under the non-stationary assumption (called BWBIC), which can help significantly improve the reconstruction accuracy and largely reduce over-fitting. Moreover, the re-estimation formulas for all parameters of our model are derived, enabling us to avoid reliance on parameter settings. Compared to the state-of-the-art methods, the experimental evaluation of our proposed method on both synthetic and real biological data demonstrates more stably high prediction accuracy and significantly improved computation efficiency, even with no prior knowledge and parameter settings. Among diverse tools available for analyzing temporal sequences, Dynamic Bayesian Network (DBN) has been one of the most widely used to infer regulatory relationships in systems biology. The standard assumption underlying DBN is stationarity, that is, the structure and parameters of DBN are fixed over time. However, this hypothesis is too restrictive and does not hold for many real biological problems. For instance, gene regulatory relationships and signal transduction processes in the cell are usually adaptive and change due to the environmental stimuli and growth phases, such as immune responses, cancer progression, and developmental processes. There have been various efforts to relax the stationary assumption for undirected graphical models, such as Markov Chain Monte Carlo (MCMC) and convex optimization-based Gaussian graphical models 1,2 , and especially, the widely used l1-norm regression-based time-varying networks 3-7 . While these methods are all promising, their restriction is that the undirected graphical models lack semantic interpretability when compared to the directed probabilistic graphical model DBN. The directed edges in DBN bear a natural causal implication and are more likely to suggest regulatory relations. Relaxing the stationary restriction in DBNs is a very recent research topic 8-13 . These approaches are all based on a combination of DBN with a multiple change-point process, and the application of a Bayesian inference scheme via Reversible Jump Markov Chain Monte Carlo (RJMCMC) sampling. To be specific, the works 8,9 proposed a discrete non-stationary DBN, which allows for different structures in different segments of the time series, with global change points for all variables. The works 10,11 proposed a continuous inhomogeneous DBN, which assumes a fixed network structure and only allows the parameters to vary with time. The works 12,13 proposed an alternative continuous regression-based time-varying DBN with node-specific change points, that is, network structures associated with different nodes are allowed to change with time in different ways. These extended DBN models, however, still have obvious limitations, leaving room for further methodological innovation. These works employ RJMCMC sampling to infer the non-stationary network. The primary disadvantage of sampling methods in comparison to search methods is that they often take much longer before converging on accurate results. Additionally, it is very important but difficult for sampling technique to identify when the algorithm converges. User experience is sometimes required to specify a suitable iteration step based on the complexity of the problem. Parameter settings In these methods, different probabilistic distributions are assumed to penalize the number of change points, such as exponential, negative binomial and Poisson distributions. Various parameters are introduced accordingly. However, these works did not infer all parameters from data, with some of them set manually. The prediction under different parameter settings might change largely, thereby resulting in inference uncertainty. Scoring criteria The relaxation of stationary hypothesis for DBN leads to a highly flexible model. This might lead to over-fitting or inflated inference uncertainty, especially when the subsequent transition times are close together, and the network structures must be inferred from short time series segments. To address this problem, previous works have proposed to couple information sequentially 8, 9, 14, 15 or globally 16,17 by assuming similar parameters for networks on different time segments. However, the traditional metrics for evaluating a stationary DBN, e.g. Bayesian-Dirichlet equivalent (BDe) metric 18 and Bayesian Information Criteria (BIC) 19 , only use the data in each time segment to separately evaluate each individual network. These metrics cannot benefit from information sharing among different time segments. The works 9,11 extend the traditional BDe and BGe scores for non-stationary networks in discrete and continuous conditions, called nsBDe and cpBGe, respectively. However, they are still simple applications of traditional BDe and BGe to stationary DBNs on each time segment. In this paper, we propose a novel node-specific, non-stationary DBN model by extending each hidden node of Hidden Markov Model (HMM) into a DBN that is capable of modeling the underlying time-evolving network structures. Next, we propose an improved Structural Expectation Maximization (SEM) algorithm to learn a HMDBN model from a time-series dataset. On the basis of SEM, we first derive the re-estimation formulas for all parameters of our model by maximizing the objective function of SEM; meanwhile, we derived a novel generalized BIC under non-stationary assumption; finally, we propose a heuristic time-efficient approach to reduce searching space of the SEM algorithm. Compared to some recent state-of-art methods in the literatures, the experimental evaluation of our proposed method on both synthetic and real biological data demonstrates more stably high prediction accuracy and significantly improved computation speed, even without prior knowledge and parameter settings. Our approach has the following attractive contributions: Novel non-stationary DBN model Two well-studied methods, HMM and DBN, are combined to address real-biological problems. The existing research achievements for these two models, such as the well-known Viterbi algorithm, Baum-Welch algorithm 20 and score-based greedy climbing algorithm, motivate us to propose a search-based method to decode the transition time as well as learn the parameters and network structures. This novel metric can benefit from the information sharing among networks on different time segments. This sharing allows our proposed metric to more reasonably evaluate a candidate non-stationary network. Compared to traditional metrics, it can greatly improve the prediction accuracy and reduce over-fitting. Time-efficient heuristic searching method A heuristic approach is proposed to reduce the searching space for a non-stationary DBN to the identical one for a stationary DBN, thereby substantially improving the computation speed.
doi:10.1038/srep17841 pmid:26680653 pmcid:PMC4683538 fatcat:a4c2qrifxbe73ocd6m3zc43oeq