An efficient, non-phylogenetic method for detecting genes sharing evolutionary signals in phylogenomic datasets
Genome Biology and Evolution
Assessing the compatibility between gene family phylogenies is a crucial and often computationally demanding step in many phylogenomic analyses. Here we describe the Evolutionary Similarity Index ( I E S ), a mean to assess shared evolution between gene families using a weighted Orthogonal Distance Regression model applied to sequence distances. The utilization of pairwise distance matrices circumvents comparisons between gene tree topologies, which are inherently uncertain and sensitive to
... utionary model choice, phylogenetic reconstruction artifacts, and other sources of error. Furthermore, I E S enables the many-to-many pairing of multiple copies between similarly evolving gene families. This is done by selecting non-overlapping pairs of copies, one from each assessed family, yielding the least sum of squared residuals. Analyses of simulated gene family datasets show that I E S 's accuracy is on par with popular tree-based methods while also less susceptible to noise introduced by sequence alignment and evolutionary model fitting. Applying I E S to an empirical dataset of 1,322 genes from 42 archaeal genomes identified eight major clusters of gene families with compatible evolutionary trends. The most cohesive cluster consisted of 62 genes with compatible evolutionary signal, occurring as both single-copy and multiple homologs per genome; phylogenetic analysis of concatenated alignments from this cluster produced a tree closely matching previously published species trees for Archaea. Four other clusters are mainly composed of accessory genes with limited distribution among Archaea and enriched towards specific metabolic functions. Pairwise evolutionary distances obtained from these accessory gene clusters suggest patterns of inter-phyla horizontal gene transfer. An I E S implementation is available at https://github.com/lthiberiol/evolSimIndex.