Combining Data Sets with Different Phylogenetic Histories

John J. Wiens, D. Cannatella
1998 Systematic Biology  
The possibility that two data sets may have different underlying phylogenetic histories (such as gene trees that deviate from species trees) has become an important argument against combining data in phylogenetic analysis. However, two data sets sampled for a large number of taxa may differ in only part of their histories. This is a realistic scenario and one in which the relative advantages of combined, separate, and consensus analysis become much less clear. I propose a simple methodology for
more » ... ple methodology for dealing with this situation that involves (1) partitioning the available data to maximize detection of different histories, (2) performing separate analyses of the data sets, and (3) combining the data but considering questionable or unresolved those parts of the combined tree that are strongly contested in the separate analyses (and which therefore may have different histories) until a majority of unlinked data sets support one resolution over another. In support of this methodology, computer simulations suggest that (1) the accuracy of combined analysis for recovering the true species phylogeny may exceed that of either of two separately analyzed data sets under some conditions, particularly when the mismatch between phylogenetic histories is small and the estimates of the underlying histories are imperfect (few characters, high homoplasy, or both) and (2) combined analysis provides a poor estimate of the species tree in areas of the phylogenies with different histories but gives an improved estimate in regions that share the same history. Thus, when there is a localized mismatch between the histories of two data sets, the separate, consensus, and combined analyses may all give unsatisfactory results in certain parts of the phylogeny. Similarly, approaches that allow data combination only after a global test of heterogeneity will suffer from the potential failings of either separate or combined analysis, depending on the outcome of the test. Excision of con icting taxa is also problematic, in that doing so may obfuscate the position of con icting taxa within a larger tree, even when their placement is congruent between data sets. Application of the proposed methodology to molecular and morphological data sets for Sceloporus lizards is discussed.
doi:10.1080/106351598260581 pmid:12066302 fatcat:uznxz3ie2jbdjgnv34r4adun5a