Robinson-Foulds Reticulation Networks [article]

Alexey Markin, Tavis K Anderson, Venkata SKT Vadali, Oliver Eulenstein
2019 bioRxiv   pre-print
Phylogenetic (hybridization) networks allow investigation of evolutionary species histories that involve complex phylogenetic events other than speciation, such as reassortment in virus evolution or introgressive hybridization in invertebrates and mammals. Reticulation networks can be inferred by solving the reticulation network problem, typically known as the hybridization network problem. Given a collection of phylogenetic input trees, this problem seeks a minimum reticulation network with
more » ... smallest number of reticulation vertices into which the input trees can be embedded exactly. Unfortunately, this problem is limited in practice, since minimum reticulation networks can be easily obfuscated by even small topological errors that typically occur in input trees inferred from biological data. We adapt the reticulation network problem to address erroneous input trees using the classic Robinson-Foulds distance. The RF embedding cost allows trees to be embedded into reticulation networks inexactly, but up to a measurable error. The adapted problem, called the Robinson-Foulds reticulation network (RF-Network) problem is, as we show and like many other problems applied in molecular biology, NP-hard. To address this, we employ local search strategies that have been successfully applied in other NP-hard phylogenetic problems. Our local search method benefits from recent theoretical advancements in this area. Further, we introduce in-practice effective algorithms for the computational challenges involved in our local search approach. Using simulations we experimentally validate the ability of our method, RF-Net, to reconstruct correct phylogenetic networks in the presence of error in input data. Finally, we demonstrate how RF-networks can help identify reassortment in influenza A viruses, and provide insight into the evolutionary history of these viruses. RF-Net was able to estimate a large and credible reassortment network with 164 taxa.
doi:10.1101/642793 fatcat:dtyox5djavfujlf3j3nucnbptu