Reconstructing the evolutionary history of a BCR lineage with minimum spanning tree and clonotype abundances [article]

Nika Abdollahi, Anne de Septenville, Frederic Davi, Juliana Silva Bernardes
2022 bioRxiv   pre-print
B cell receptor (BCR) genes exposed to an antigen undergo somatic hypermutations and Darwinian antigen selection, generating a large BCR-antibody diversity. This process, known as B cell affinity maturation, increases antibody affinity, forming a specific B cell lineage that includes the unmutated ancestor and mutated variants. In a B cell lineage, cells with a higher antigen affinity will undergo clonal expansion, while those with a lower affinity will not proliferate and probably be
more » ... . Therefore, the abundance of different genotypes provides a valuable perspective on the ongoing evolutionary process. Phylogenetic tree inference is often used to reconstruct B cell lineage trees and represents the evolutionary dynamic of BCR affinity maturation. However, such methods should process B cell population data derived from experimental sampling that might contain different cellular abundances. There are a few phylogenetic methods for reconstructing the evolutionary history of B cell lineages; best-performing solutions are time-demanding and restricted to analyze a reduced number of BCR sequences, while time-efficient methods do not consider cellular abundances. We propose ClonalTree, a low-complexity and accurate approach to reconstruct BCR lineage trees that incorporates genotype abundances into minimum spanning tree (MST) algorithms. Using both simulated and experimental data, we demonstrated that ClonalTree outperforms MST-based algorithms and achieves a similar performance compared to a method that explores tree generating space exhaustively. However, ClonalTree has a lower running time, being more convenient for reconstructing phylogenetic lineage trees from high-throughput BCR sequencing data, mainly in biomedical applications, where a lower computational time is appreciable. It is hundreds to thousands of times faster than exhaustive approaches, enabling the analysis of a large set of sequences within minutes or seconds and without loss of accuracy. The source code is freely available at
doi:10.1101/2022.02.27.481992 fatcat:74ratmzmu5affowidaumcu7zba