Non-phylogenetic identification of co-evolving genes for reconstructing the archaeal Tree of Life
Assessing the phylogenetic compatibility between individual gene families is a crucial and often computationally demanding step in many phylogenomics analyses. Here we describe the Evolutionary Similarity Index (IES) to assess shared evolution between gene families using a weighted Orthogonal Distance Regression applied to sequence distances. This approach allows for straightforward pairing of paralogs between co-evolving gene families without resorting to multiple tests, or a priori
... of molecular interactions between protein products from assessed genes. The utilization of pairwise distance matrices, while less informative than phylogenies, circumvents error-prone comparisons between trees whose topologies are inherently uncertain. Analyses of simulated tree datasets showed that I_ES was more accurate and less susceptible to phylogenetic noise than existing tree-based methods (Robinson-Foulds and geodesic distance) for assessing evolutionary signal compatibility. Applying IES to a real dataset of 1,322 genes from 42 archaeal genomes identified eight major clusters of co-evolving gene families. Four of these clusters included genes with a taxonomic distribution across all archaeal phyla, while other clusters included a subset of taxa that do not map to generally accepted archaeal clades, indicating possible shared horizontal transfers by co-evolving gene families. We identify one strongly connected set of 62 co-evolving genes occurring as both single-copy and multiple homologs per genome, with compatible evolutionary histories closely matching previously published species trees for Archaea. An I_ES implementation is available at https://github.com/lthiberiol/evolSimIndex.