High precision in microRNA prediction: a novel genome-wide approach based on convolutional deep residual networks [article]

Cristian Ariel Yones, Jonathan Raad, Leandro Bugnon, Diego Humberto Milone, Georgina Stegmayer
2020 bioRxiv   pre-print
Motivation: MicroRNAs (miRNAs) are small non-coding RNAs that have a key role in the regulation of gene expression. The importance of miRNAs is widely acknowledged by the community nowadays, and the precise prediction of novel candidates with computational methods is still very needed. This could be done by searching homologous with sequence alignment tools, but this will be restricted only to sequences very similar to the known miRNA precursors (pre-miRNAs). Furthermore, other important
more » ... ies of pre-miRNAs, such as the secondary structure, are not taken into account by these methods. Many machine learning approaches were proposed in the last years to fill this gap, but these methods were tested in very controlled conditions, which are not fulfilled, for example, when predicting in newly sequenced genomes, where no miRNAs are known. If these methods are used under real conditions, the precision achieved is far from the one published. Results: This work provides a novel approach for dealing with the computational prediction of pre-miRNAs: a convolutional deep residual neural network. The proposed model has been tested on several complete genomes of animals and plants, achieving a precision up to 5 times higher than other approaches at the same recall rates. Also, a novel validation methodology is used to ensure that the performance reported can be achieved when using the method on new unknown species. Availability: To provide fast an easy access to mirDNN, a web demo is available in http://sinc.unl.edu.ar/web-demo/mirdnn/. It can process fasta files with multiple sequences to calculate the prediction scores, and can generate the nucleotide importance plots. The full source code of this project is available at http://sourceforge.net/projects/sourcesinc/files/mirdnn Contact: cyones@sinc.unl.edu.ar
doi:10.1101/2020.10.23.352179 fatcat:tktyltycz5gzvoluia5dvrity4