A novel algorithm to accurately classify metagenomic sequences [article]

Subrata Saha, Zigeng Wang, Sanguthevar Rajasekaran
2020 bioRxiv   pre-print
Widespread availability of next-generation sequencing (NGS) technologies has prompted a recent surge in interest in the microbiome. As a consequence, metagenomics is a fast-growing field in bioinformatics and computational biology. An important problem in analyzing metagenomic sequenced data is to identify the microbes present in the sample and figure out their relative abundances. In this article we propose a highly efficient algorithm dubbed as Hybrid Metagenomic Sequence Classifier (HMSC) to
more » ... accurately detect microbes and their relative abundances in a metagenomic sample. The algorithmic approach is fundamentally different from other state-of-the-art algorithms currently existing in this domain. HMSC judiciously exploits both alignment-free and alignment-based approaches to accurately characterize metagenomic sequenced data. To demonstrate the effectiveness of HMSC we used 8 metagenomic sequencing datasets (2 mock and 6 in silico bacterial communities) produced by 3 different sequencing technologies (e.g., HiSeq, MiSeq, and NovaSeq) with realistic error models and abundance distribution. Rigorous experimental evaluations show that HMSC is indeed an effective, scalable, and efficient algorithm compared to the other state-of-the-art methods in terms of accuracy, memory, and runtime.
doi:10.1101/2020.10.01.321067 fatcat:alxnq4sip5e5dlsi2s4wvmtyrm