A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
Summary: Human alpha satellite and satellite 2/3 contribute to several percent of the human genome. However, identifying these sequences with traditional algorithms is computationally intensive. ... Here we develop dna-brnn, a recurrent neural network to learn the sequences of the two classes of centromeric repeats. It achieves high similarity to RepeatMasker and is times faster. ... ACKNOWLEDGEMENT We thank the second anonymous reviewer for pointing out an issue with our running RepeatMasker, which led to unfair performance comparison in an earlier version of this manuscript. ...arXiv:1901.07327v2 fatcat:qw3p4pvmnbh5jlwdjglclzif6y
So identifying and classifying repeats is an important step in genome annotation. ... This combines the basic concepts of Li (Bioinformatics 35:4408–4410, 2019) with current techniques developed for neural machine translation, the attention mechanism, for the task of nucleotide-level annotation ... Nevertheless, DeepGRP is able to correctly identify several repeats in mm10/chr2, for which it achieves considerably smaller FNRs than dna-brnn. ...doi:10.1186/s13015-021-00199-0 fatcat:i4c5y6cm4zahzoogftd2djl4re
The centromeres consist of megabase-scale tandemly repeated satellite arrays, which support high CENH3 occupancy and are densely DNA methylated, with satellite variants private to each chromosome. ... CENH3 preferentially occupies satellites with least divergence and greatest higher-order repetition. ... network (BRNN) with long short-term memory (LSTM) units to detect DNA 5mC methylation. ...doi:10.1101/2021.05.30.446350 fatcat:doxvblxk3ff5lnjtqufslrceta
Long reads are now able to span difficult, heterochromatic regions, including full centromeres, and characterize chromosomes from "telomere to telomere". ... I also consider the forthcoming challenges and solutions with regards to long reads, where we expect the shift from the problem of repeat localization within a single individual to the problem of repeat ... Identifying centromeric satellites with dna-brnn. Bioinformatics 2019, 35, 4408-4410. [CrossRef] 29. Cheng, H.; Concepcion, G.T.; Feng, X.; Zhang, H.; Li, H. ...doi:10.3390/genes12010048 pmid:33396198 fatcat:wgvmzs3ptfaznbxlp5kxymjebe
Human pan-genome studies offer the opportunity to identify human non-reference sequences (NRSs) which are, by definition, not represented in the reference human genome (GRCh38). ... The major sources of contamination are related to Rhyzobiales, Burkholderiales, Pseudomonadales and Lactobacillales, which may have been associated with the original samples or introduced later during ... To identify repeats and low complexity regions missed by RepeatMasker, we additionally run dna-brnn 15 , a program specific for human centromeric alpha satellite and satellite 2/3, dustmasker v1.0.0 ...doi:10.1101/2020.03.16.994376 fatcat:ludlgktkfvealhilppd4trwqki