A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2007; you can also visit the original URL.
The file type is
To improve the accuracy of tree reconstruction, phylogeneticists are extracting increasingly large multigene data sets from sequence databases. Determining whether a database contains at least k genes sampled from at least m species is an NP-complete problem. However, the skewed distribution of sequences in these databases permits all such data sets to be obtained in reasonable computing times even for large numbers of sequences. We developed an exact algorithm for obtaining the largestdoi:10.1093/molbev/msg115 pmid:12777519 fatcat:j3bx2qsenbhzxnrdr3r4ef2jdy