A New Method for Database Searching and Clustering

Antje Krause, Martin Vingron
1997 Genome Informatics Series  
An iterative database searching method is introduced and applied to the design of a database clustering procedure. The search method virtually never produces false positive hits while determining meaningfully large sets of sequences related to the query. A novel set-theoretic database clustering algorithm exploits this feature and avoids a traditional, distance-based clustering step. This makes it fast and applicable to data-sets of the size of, e.g., the Swiss-Prot database. In practice we
more » ... eve unambiguous assignment of 80% of Swiss-Prot sequences to non-overlapping sequence clusters in an entirely automatic fashion.
doi:10.11234/gi1990.8.90 fatcat:bekpxrkiwbaj3gvasp45t7pj7a