A performance study of the impact of recombination on species tree analysis
Zhiwei Wang, Kevin J. Liu
The most widely used state-of-the-art methods for reconstructing species phylogenies from genomic sequence data assume that sampled loci are identically and independently distributed. In principle, free recombination between loci and a lack of intra-locus recombination are necessary to satisfy this assumption. Few studies have quantified the practical impact of recombination on species tree inference methods, and even fewer have used genomic sequence data for this purpose. One prominent
... n is the 2012 study of Lanier and Knowles. A main finding from the study was that species tree inference methods are relatively robust to intra-locus recombination, assuming free recombination between loci. The latter assumption means that the open question regarding the impact of recombination on species tree analysis is not fully resolved. Results: The goal of this study is to further investigate this open question. Using simulations based upon the multi-species coalescent-with-recombination model as well as empirical datasets, we compared common pipeline-based techniques for inferring species phylogenies. The simulation conditions included a range of dataset sizes and several choices for recombination rate which was either uniform across loci or incorporated recombination hotspots. We found that pipelines which explicitly utilize inferred recombination breakpoints to delineate recombination-free intervals result in greater accuracy compared to widely used alternatives that preprocess sequences based upon linkage disequilibrium decay. Furthermore, the use of a relatively simple approach for recombination breakpoint inference does not degrade the accuracy of downstream species tree inference compared to more accurate alternatives. Conclusions: Our findings clarify the impact of recombination upon current phylogenomic pipelines for species tree inference. Pipeline-based approaches which utilize inferred recombination breakpoints to densely sample loci across genomic sequences can tolerate intra-locus recombination and violations of the assumption of free recombination between loci.