Genomic variant identification methods alter Mycobacterium tuberculosis transmission inference [article]

Katharine S. Walter, Caroline Colijn, Ted Cohen, Barun Mathema, Qingyun Liu, Jolene R. Bowers, David M. Engelthaler, Apurva Narechania, Julio Croda, Jason R. Andrews
2019 bioRxiv   pre-print
Pathogen genomic data are increasingly used to characterize global and local transmission patterns of important human pathogens and to inform public health interventions. Yet there is no current consensus on how to measure genomic variation. We investigated the effects of variant identification approaches on transmission inferences for M. tuberculosis by comparing variants identified by five different groups in the same sequence data from a clonal outbreak. We then measured the performance of
more » ... mmonly used variant calling approaches in recovering variation in a simulated tuberculosis outbreak and tested the effect of applying increasingly stringent filters on transmission inferences and phylogenies. We found that variant calling approaches used by different groups do not recover consistent sets of variants, often leading to conflicting transmission inferences. Further, performance in recovering true outbreak variation varied widely across approaches. Finally, stringent filters rapidly eroded the accuracy of transmission inferences and quality of phylogenies reconstructed from outbreak variation. We conclude that measurements of genetic distance and phylogenetic structure are dependent on variant calling approach. Variant calling algorithms trained upon true sequence data outperform other approaches and enable inclusion of repetitive regions typically excluded from genomic epidemiology studies, maximizing the information gleaned from outbreak genomes.
doi:10.1101/733642 fatcat:fgq7oyhkrvgmhhu55tkb5h262e