5 Hits in 2.5 sec

Identifying centromeric satellites with dna-brnn [article]

Heng Li
2019 arXiv   pre-print
Summary: Human alpha satellite and satellite 2/3 contribute to several percent of the human genome. However, identifying these sequences with traditional algorithms is computationally intensive.  ...  Here we develop dna-brnn, a recurrent neural network to learn the sequences of the two classes of centromeric repeats. It achieves high similarity to RepeatMasker and is times faster.  ...  ACKNOWLEDGEMENT We thank the second anonymous reviewer for pointing out an issue with our running RepeatMasker, which led to unfair performance comparison in an earlier version of this manuscript.  ... 
arXiv:1901.07327v2 fatcat:qw3p4pvmnbh5jlwdjglclzif6y

DeepGRP: engineering a software tool for predicting genomic repetitive elements using Recurrent Neural Networks with attention

Fabian Hausmann, Stefan Kurtz
2021 Algorithms for Molecular Biology  
So identifying and classifying repeats is an important step in genome annotation.  ...  This combines the basic concepts of Li (Bioinformatics 35:4408–4410, 2019) with current techniques developed for neural machine translation, the attention mechanism, for the task of nucleotide-level annotation  ...  Nevertheless, DeepGRP is able to correctly identify several repeats in mm10/chr2, for which it achieves considerably smaller FNRs than dna-brnn.  ... 
doi:10.1186/s13015-021-00199-0 fatcat:i4c5y6cm4zahzoogftd2djl4re

The genetic and epigenetic landscape of the Arabidopsis centromeres [article]

Matthew Naish, Michael Alonge, Piotr Wlodzimierz, Andrew J Tock, Bradley W Abramson, Christophe A Lambing, Pallas Kuo, Natasha Yelina, Nolan Hartwick, Kelly Colt, Tetsuji Kakutani, Robert A Martienssen (+4 others)
2021 bioRxiv   pre-print
The centromeres consist of megabase-scale tandemly repeated satellite arrays, which support high CENH3 occupancy and are densely DNA methylated, with satellite variants private to each chromosome.  ...  CENH3 preferentially occupies satellites with least divergence and greatest higher-order repetition.  ...  network (BRNN) with long short-term memory (LSTM) units to detect DNA 5mC methylation.  ... 
doi:10.1101/2021.05.30.446350 fatcat:doxvblxk3ff5lnjtqufslrceta

Probably Correct: Rescuing Repeats with Short and Long Reads

Monika Cechova
2020 Genes  
Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere".  ...  I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat  ...  Identifying centromeric satellites with dna-brnn. Bioinformatics 2019, 35, 4408-4410. [CrossRef] 29. Cheng, H.; Concepcion, G.T.; Feng, X.; Zhang, H.; Li, H.  ... 
doi:10.3390/genes12010048 pmid:33396198 fatcat:wgvmzs3ptfaznbxlp5kxymjebe

Microbial contaminants cataloged as novel human sequences in recent human pan-genomes [article]

Mosè Manni, Evgeny M Zdobnov
2020 bioRxiv   pre-print
Human pan-genome studies offer the opportunity to identify human non-reference sequences (NRSs) which are, by definition, not represented in the reference human genome (GRCh38).  ...  The major sources of contamination are related to Rhyzobiales, Burkholderiales, Pseudomonadales and Lactobacillales, which may have been associated with the original samples or introduced later during  ...  To identify repeats and low complexity regions missed by RepeatMasker, we additionally run dna-brnn 15 , a program specific for human centromeric alpha satellite and satellite 2/3, dustmasker v1.0.0  ... 
doi:10.1101/2020.03.16.994376 fatcat:ludlgktkfvealhilppd4trwqki