Tree Shape-based approaches for the Comparative study of Cophylogeny [article]

Mariano Avino, Garway T Ng, YiYing He, Mathias S Renaud, Bradley R Jones, Art FY Poon
2018 bioRxiv   pre-print
Cophylogeny is the congruence of phylogenetic relationships between two different groups of organisms due to their long-term interaction, such as between host and pathogen species. Discordance between host and pathogen phylogenies may occur due to pathogen 'host-switch' events, pathogen speciation within a host species, and extinction. Here, we investigated the use of tree shape distance measures to quantify the degree of cophylogeny for the comparative analysis of host-pathogen interactions
more » ... oss taxonomic groups. We firstly implemented a coalescent model to simulate pathogen phylogenies within a fixed host tree, given the cospeciation probability, migration rate between hosts, and pathogen speciation rate within hosts. Next, we used simulations from this model to evaluate 13 distance metrics between these trees and the host tree, including Robinson-Foulds distance and two kernel distances that we developed for labeled and unlabeled trees, which use branch lengths and can accommodate trees of different sizes. Finally, we used these distance metrics to revisit actual datasets from published cophylogenetic studies across all taxonomic groups, where authors described the observed associations as representing a high or low degree of cophylogeny. Our simulation analyses demonstrated that some metrics are more informative than others with respect to specific coevolution parameters. For example, the Sim metric was the most responsive to variation in coalescence rates, whereas the unlabeled kernel metric was the most responsive to cospeciation probabilities. We also determined that distance metrics were more informative about the model parameters when the underlying parameter values did not assume extreme values,e.g., rapid host switching. When applied to real datasets, projection of these trees' associations into a parameter space defined by the 13 distance metrics revealed some clustering of studies reporting low concordance. This suggested that different investigators are describing concordance in a consistent way across biological systems, and that these expert subjective assessments can be at least partly quantified using distance metrics. Our results support the hypothesis that tree distance measures can be useful for quantifying host and pathogen cophylogeny. This motivates the usage of distance metrics in the field of coevolution and supports the development of simulation-based methods,i.e., approximate Bayesian computation, to estimate coevolutionary parameters from the discordant shapes of host and pathogen trees.
doi:10.1101/388116 fatcat:n6g3h6sy7vgstewbxut6o7u3da