SIAMCAT: user-friendly and versatile machine learning workflows for statistically rigorous microbiome analyses [article]

Jakob Wirbel, Konrad Zych, Morgan Essex, Nicolai Karcher, Ece Kartal, Guillem Salazar, Peer Bork, Shinichi Sunagawa, Georg Zeller
2020 bioRxiv   pre-print
The human microbiome is increasingly mined for diagnostic and therapeutic biomarkers. However, computational tools tailored to such analyses are still scarce. Here, we present the SIAMCAT R package, a versatile and user-friendly toolbox for comparative metagenome analyses using machine learning (ML), statistical tests, and visualization. Based on a large meta-analysis of gut microbiome studies, we optimized the choice of ML algorithms and preprocessing routines for default workflow settings.
more » ... thermore, we illustrate common pitfalls leading to overfitting and show how SIAMCAT safeguards against this to make statistically rigorous ML workflows broadly accessible. SIAMCAT is available from siamcat.embl.de and Bioconductor.
doi:10.1101/2020.02.06.931808 fatcat:jraubeuycrbt5chykvybdlzzuy