A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is
Acknowledgements We are grateful to James Tripp from University of California Santa Cruz, who took the time to alert us to one of these spurious families. ... Once one gene has been spuriously predicted and put in the sequence database, there is a danger that future genome projects will annotate new protein-coding genes by similarity to the first spurious ORF ... These models are designed to identify commonly recurring spuriously predicted ORFs. ...doi:10.1093/database/bas003 pmid:22434837 pmcid:PMC3308159 fatcat:274zzsfhfzfhxlz5bwp3kvtsl4
We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes. ... Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than ... Grant information The authors declare that no grants were involved in supporting this work ...doi:10.12688/f1000research.14050.1 pmid:29721311 pmcid:PMC5897793 fatcat:c2ulay4ou5f55jv6dpfum676g4
Here we draw attention to a large number of likely genes missing from annotations using common tools such as Glimmer and BLAST. ... Annotation methods vary considerably and may fail to identify some genes. ... The other two spurious genes in this set, called translations of CRISPR regions by AntiFam, are homologs to two genes in Syntrophus aciditrophicus SB that were annotated as a "putative cytoplasmic protein ...doi:10.1186/1745-6150-7-37 pmid:23111013 pmcid:PMC3534567 fatcat:f64hxyyxureh5drimum5boar24
We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes. ... Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than ... In this paper we begin to address this problem by creating a generic tool to identify spurious proteins. We term the task of identifying and deleting spurious gene predictions as gene unprediction. ...doi:10.5256/f1000research.15280.r31445 fatcat:bgnqozvbw5dixlvlgoopq3orhe
A further 1470 genes annotated as coding in all three reference sets have characteristics that are typical of non-coding genes or pseudogenes. ... Data from large-scale genetic variation analyses suggests that most are not under protein-like purifying selection and so are unlikely to code for functional proteins. ... ACKNOWLEDGEMENTS The authors would like to thank Iakes Ezkurdia and Jon Mudge for their input on this paper. ...doi:10.1093/nar/gky587 pmid:29982784 pmcid:PMC6101605 fatcat:4pyw4qdhtndkbcs7yag3prxcee