A Generative Adversarial Network Model for Disease Gene Prediction with RNA-seq Data

Xue Jiang, Jingjing Zhao, Wei Qian, Weichen Song, Guan Ning Lin
2020 IEEE Access  
Deep learning models often need large amounts of training samples (thousands of training samples) to effectively extract hidden patterns in the data, thus achieving better results. However, in the field of brain-related disease, the omics data obtained by using advanced sequencing technology typically have much fewer patient samples (tens to hundreds of samples). Due to the small sample problem, statistical methods and intelligent machine learning methods have been unable to obtain a convergent
more » ... gene set when prioritizing biomarkers. Furthermore, mathematical models designed for prioritizing biomarkers perform differently on different datasets. However, the architecture of the generative adversarial network (GAN) can address this bottleneck problem. Through the game between the generator and the discriminator, samples with similar distributions to that of samples in the training set can be generated by the generator, and the prediction accuracy and robustness of the discriminator could be significantly improved. Therefore, in this study, we designed a new generative adversarial network model with a denoising auto-encoder (DAE) as the generator and a multilayer perceptron (MLP) as the discriminator. The prediction residual error was backpropagated to the decoder part of the DAE, modifying the captured probability distribution. Based on this model, we further designed a framework to predict disease genes with RNA-seq data. The deep learning model improves the identification accuracy of disease genes over the-state-of-the-art approaches. An analysis of the experimental results has uncovered new disease-related genes and disease-associated pathways in the brain, which in turn have provided insight into the molecular mechanisms underlying disease phenotypes. INDEX TERMS Denoising auto-encoder, multilayer perceptron, generative adversarial network, RNA-seq data.
doi:10.1109/access.2020.2975585 fatcat:ublickg45ndbbmdfxkeej77k7i