A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit the original URL.
The file type is application/pdf
.
RSDB: representative protein sequence databases have high information content
2000
Bioinformatics
Motivation: Biological sequence databases are highly redundant for two main reasons: 1. various databanks keep redundant sequences with many identical and nearly identical sequences 2. natural sequences often have high sequence identities due to gene duplication. We wanted to know how many sequences can be removed before the databases start losing homology information. Can a database of sequences with mutual sequence identity of 50% or less provide us with the same amount of biological
doi:10.1093/bioinformatics/16.5.458
pmid:10871268
fatcat:i56ijyhnk5dmtj6tecbihzwzjq