Identifying enhancers and their strength by the integration of word embedding and Convolution Neural Network

Jhabindra Khanal, Hilal Tayara, Kil To Chong
2020 IEEE Access  
The enhancer is a short regulatory element that plays a major role in up-regulating eukaryotic gene expression. To identify enhancers, an experimental process takes a long time and high cost; therefore, an accurate computational tool is a much-needed work in this area. Existing techniques were developed by the use of handcrafted features followed by machine learning techniques, while the proposed model extracts the features of enhancers from raw DNA sequences by the integration of natural
more » ... ge processing (NLP) technique using word2vec and convolutional neural network (CNN). Therefore, an accurate computational tool, iEnhancer-CNN, is developed. The developed tool can predict enhancers and their strength. The evaluation results show that iEnhancer-CNN is remarkably superior to the existing state-of-the-art models. In more detail, iEnhancer-CNN improved the accuracy of enhancer and enhancer strength identification by 2.6% and 11.4%, respectively. A web server for the iEnhancer-CNN is freely available at https://home.jbnu.ac.kr/NSCL/iEnhancer-CNN.htm. INDEX TERMS Convolutional neural network, DNA sequence, deep learning, enhancers, K-mers, word2vec.
doi:10.1109/access.2020.2982666 fatcat:l5mlar7mpfcydnaveyresv5yea