Integrative computational microbial genomics for large-scale metagenomic analyses

Francesco Beghini
2021
Advancements of DNA sequencing technologies and improvement of analytic methods changed the way we analyze complex microbial communities (metagenomics). In only a few years, these methods have evolved so far as to ease a more precise community profiling and to allow high-level strain resolution. A typical computational metagenomic analysis relies on mapping raw DNA sequencing reads against sets of "reference" microbial genomes usually obtained through single-isolate sequencing. With an almost
more » ... ponential increase in the number of reference genomes deposited daily in public data sets, current computational methods are incapable of managing and exploiting such a rich reference set, limiting the potential of metagenomic investigations.In my doctoral thesis, I will present my contribution towards fully exploiting the available reference data for metagenomic analysis. I developed ChocoPhlAn, an integrated pipeline for automatic retrieval, organization, and annotation of reference genomes and gene families as the foundation for bioBakery 3, an improved set of computational methods for the analysis of shotgun metagenomics data. Using the latest set of microbial genomic reference data available and processed through ChocoPhlAn, the six bioBakery 3 tools that I updated resulted in more comprehensive and higher resolution taxonomic and functional profiling of microbiomes and allowed strain-level characterization of their constituent strains. After extensive benchmarks with previous versions and competitors, we applied those methods on more than 10,000 real metagenomes and showed how metagenomics can be a more powerful tool for identifying novel links between the gut microbiome and disease conditions such as colorectal cancer and Inflammatory Bowel Disease. Accurate strain-level phylogeny reconstruction and pangenomic analysis of 7,783 metagenomes revealed novel functional, phylogenetic, and geographic diversity of Ruminococcus bromii, a common and highprevalent gut inhabitant. We then focused on the influence of the Eukary [...]
doi:10.15168/11572_296396 fatcat:dzdsnnuhdbgbjcmwyejvuxuo4e