A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2009; you can also visit the original URL.
The file type is application/pdf
.
CLEANUP: a fast computer program for removing redundancies from nucleotide sequence databases
1996
Bioinformatics
A key concept in comparing sequence collections is the issue of redundancy. The production of sequence collections free from redundancy is undoubtedly very useful, both in performing statistical analyses and accelerating extensive database searching on nucleotide sequences. Indeed, publicly available databases contain multiple entries of identical or almost identical sequences. Performing statistical analysis on such biased data makes the risk of assigning high significance to non-significant
doi:10.1093/bioinformatics/12.1.1
fatcat:n2ib27dv55eqlhyi4i2dx6xin4