Leveraging Stacked Denoising Autoencoder in Prediction of Pathogen-Host Protein-Protein Interactions

Huaming Chen, Jun Shen, Lei Wang, Jiangning Song
2017 2017 IEEE International Congress on Big Data (BigData Congress)  
In big data research related to bioinformatics, one of the most critical areas is proteomics. In this paper, we focus on the protein-protein interactions, especially on pathogen-host protein-protein interactions (PHPPIs), which reveals the critical molecular process in biology. Conventionally, biologists apply in-lab methods, including small-scale biochemical, biophysical, genetic experiments and large-scale experiment methods (e.g. yeast-two-hybrid analysis), to identify the interactions.
more » ... in-lab methods are time consuming and labor intensive. Since the interactions between proteins from different species play very critical roles for both the infectious diseases and drug design, the motivation behind this study is to provide a basic framework for biologists, which is based on big data analytics and deep learning models. Our work contributes in leveraging unsupervised learning model, in which we focus on stacked denoising autoencoders, to achieve a more efficient prediction performance on PHPPI. In this paper, we further detail the framework based on unsupervised learning model for PHPPI researches, while curating a large imbalanced PHPPI dataset. Our model demonstrates a better result with the unsupervised learning model on PHPPI dataset. Abstract-In big data research related to bioinformatics, one of the most critical areas is proteomics. In this paper, we focus on the protein-protein interactions, especially on pathogen-host protein-protein interactions (PHPPIs), which reveals the critical molecular process in biology. Conventionally, biologists apply in-lab methods, including small-scale biochemical, biophysical, genetic experiments and large-scale experiment methods (e.g. yeast-two-hybrid analysis), to identify the interactions. These in-lab methods are time consuming and labor intensive. Since the interactions between proteins from different species play very critical roles for both the infectious diseases and drug design, the motivation behind this study is to provide a basic framework for biologists, which is based on big data analytics and deep learning models. Our work contributes in leveraging unsupervised learning model, in which we focus on stacked denoising autoencoders, to achieve a more efficient prediction performance on PHPPI. In this paper, we further detail the framework based on unsupervised learning model for PHPPI researches, while curating a large imbalanced PHPPI dataset. Our model demonstrates a better result with the unsupervised learning model on PHPPI dataset.
doi:10.1109/bigdatacongress.2017.54 dblp:conf/bigdata/ChenSWS17 fatcat:7dtx36evw5hyjhe5tp2yzmo4vm