Deep learning predicts tuberculosis drug resistance status from genome sequencing data [article]

Michael L. Chen, Akshith Doddi, Jimmy Royer, Luca Freschi, Marco Schito, Matthew Ezewudo, Isaac S. Kohane, Andrew Beam, Maha Farhat
2018 bioRxiv   pre-print
The diagnosis of multidrug resistant and extensively drug resistant tuberculosis is a global health priority. Whole genome sequencing of clinical Mycobacterium tuberculosis isolates promises to circumvent the long wait times and limited scope of conventional phenotypic antimicrobial susceptibility, but gaps remain for predicting phenotype accurately from genotypic data. Methods and Findings: Using targeted or whole genome sequencing and conventional drug resistance phenotyping data from 3,601
more » ... cobacterium tuberculosis strains, 1,228 of which were multidrug resistant, we investigated the use of machine learning to predict phenotypic drug resistance to 10 anti-tuberculosis drugs. The final model, a multitask wide and deep neural network (MD-WDNN), achieved improved high predictive performance: the average AUCs were 0.979 for first-line drugs and 0.936 for second-line drugs during repeated cross-validation. On an independent validation set, the MD-WDNN showed average AUCs, sensitivities, and specificities, respectively, of 0.937, 87.9%, and 92.7% for first-line drugs and 0.891, 82.0% and 90.1% for second-line drugs. In addition to being able to learn from samples that have only been partially phenotyped, our proposed multidrug architecture shares information across different anti-tuberculosis drugs and genes to provide a more accurate phenotypic prediction. We use t-distributed Stochastic Neighbor Embedding (t-SNE) visualization and feature importance analyses to examine inter-drug similarities. Conclusions: Machine learning is capable of accurately predicting resistant status using genomic information and holds promise in bringing sequencing technologies closer to the bedside.
doi:10.1101/275628 fatcat:w4nmgtulqreqfkood3fatekxtu