DeepLncLoc: a deep learning framework for long non-coding RNA subcellular localization prediction based on subsequence embedding [article]

Min Zeng, YIfan Wu, Chengqian Lu, Fuhao Zhang, Fang-xiang Wu, Min Li
2021 bioRxiv   pre-print
Motivation: Long non-coding RNAs (lncRNAs) are a class of RNA molecules with more than 200 nucleotides. A growing amount of evidence reveals that subcellular localization of lncRNAs can provide valuable insights into their biological functions. Existing computational methods for predicting lncRNA subcellular localization use k-mer features to encode lncRNA sequences. However, the sequence order information is lost by using only k-mer features. Results: We proposed a deep learning framework,
more » ... LncLoc, to predict lncRNA subcellular localization. In DeepLncLoc, we introduced a new subsequence embedding method that keeps the order information of lncRNA sequences. The subsequence embedding method first divides a sequence into some consecutive subsequences, and then extracts the patterns of each subsequence, last combines these patterns to obtain a complete representation of the lncRNA sequence. After that, a text convolutional neural network is employed to learn high-level features and perform the prediction task. Compared to traditional machine learning models with k-mer features and existing predictors, DeepLncLoc achieved better performance, which shows that DeepLncLoc could effectively predict lncRNA subcellular localization. Our study not only presented a novel computational model for predicting lncRNA subcellular localization but also provided a new subsequence embedding method which is expected to be applied in other sequence-based prediction tasks.
doi:10.1101/2021.03.13.435245 fatcat:qmnwxnjk45elhph6u27rszub3y