An Iteration Method for Identifying Yeast Essential Proteins from Weighted PPI Network based on topological and functional features of proteins

Shiyuan Li, Zhiping Chen, Xin He, Zhen Zhang, Tingrui Pei, Yihong Tan, Lei Wang
2020 IEEE Access  
Accumulating studies have indicated that essential proteins play critical roles in numerous biological processes. With the rapid development of high-throughput technologies, a large number of Protein-Protein Interaction (PPI) data have been found in Saccharomyces cerevisiae, which facilitate the formation of PPI networks. Up to now, a series of computational methods for predicting essential proteins from PPI networks have been proposed successively. However, the prediction accuracy of these
more » ... utational methods is still not quite satisfactory. In this paper, a novel prediction method called CVIM is proposed to infer potential essential proteins. In CVIM, original PPI networks will be first transferred into weighted PPI networks by implementing PCC (Pearson Correlation Coefficient) on protein gene expression data. And then, based on weighted PPI networks and information of orthologous proteins, some critical network topological features and protein functional features will be extracted for each protein in the weighted PPI network. Finally, based on these newly extracted topological and functional features of proteins, an iterative algorithm will be designed to predict essential proteins. In order to evaluate the identification performance of CVIM, we have compared CVIM with 13 kinds of state-of-the-art prediction methods. Experimental results show that CVIM can achieve prediction accuracies of 92%, 80% and 71% out of the top 1%, 5% and 10% candidate proteins separately, which significantly outperform the prediction accuracies achieved by those state-of-the-art prediction methods. We have demonstrated that the prediction accuracy of essential proteins can be effectively improved by integrating the functional and network topological characteristics of proteins, which means that the novel method CVIM may be an excellent addition to the protein researches in the future. INDEX TERMS Characteristic vector, orthologous proteins, essential proteins, weighted protein-protein interaction network, iteration method. 90792 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see VOLUME 8, 2020 S. Li et al.: Iteration Method for Identifying Yeast Essential Proteins From Weighted PPI Network efficient prediction methods to identify potential essential proteins [1]-[3]. Up to now, existing prediction methods for essential protein can be roughly divided into two major categories. Methods of the first category mainly rely on the topological features of PPI networks. For instance, Li M et al. proposed a topology potential based calculative method to infer essential proteins from PPI Networks [4], and a calculative method called LAC (Local Average Connectivity-based) to infer essential proteins through evaluating the relationship between proteins and their neighborhoods [5] separately. Xu, Bin and Guan, Jihong et al developed a model to detect key proteins by weighting random walks on protein-protein interaction networks [6]. Y. Jiang and y. Wang et al. established a method for the identification of key proteins based on the prediction of key protein-protein interactions based on comprehensive edge weights [7]. Especially, based on the centrality-lethality rule proposed by Jeong et al. [9], researchers have developed various centrality-based methods, such as DC (Degree Centrality) [10], SC (Subgraph Centrality) [11], BC (Betweenness Centrality) [12], EC (Eigenvector Centrality) [13], IC (Information Centrality) [14], CC (Closeness Centrality) [15] and NC (Neighbor Centrality) [16] . These methods identify important proteins based on the topology of the PPI network, such as the number of protein connections, the number of common neighbors, and so on. Although methods of the first category have made great progress compared to traditional bio-experiments, however, due to the incomplete PPI data, which are obtained through biological experiments and often contain noise such as false positive data and false negative data, the first category of methods cannot achieve satisfactory identification accuracy of essential proteins on most occasions. Hence, different from methods of the first category, the second method is to combine the topology of PPI network with biological information (gene expression data, subcellular location data, orthology data, gene ontology) to construct a prediction model and improve the prediction accuracy. For example, Chen Lei et al. used the rich gene ontology and KEGG pathway to predict and analyze essential genes [8]. Zhao and Wang designed an iteration method called RWHN for identifying yeast essential proteins from heterogeneous network by combining PPI networks with protein domains, the subcellular localization information and orthologous information [1]. M Li et al. proposed a prediction method called PEC to identify essential proteins by combining PPI network topology and gene expression [17]. Zhang et al. developed a computational method called CoEWC through combining the characteristics of PPI network topology and protein co-expression characteristics based on gene expression profiles [18]. Seketoulie Keretsu et al. proposed a calculative method based on the weight of the edge between two interacting proteins to identify protein complexes, in which, the weight was defined by the edge clustering coefficient and the gene expression correlation between the interacting proteins [19]. Bingjing Cai et al. presented a biased random
doi:10.1109/access.2020.2993860 fatcat:rqrwkshpb5aq3ivxcia433gskq