MetaTrans: an open-source pipeline for metatranscriptomics

Xavier Martinez, Marta Pozuelo, Victoria Pascal, David Campos, Ivo Gut, Marta Gut, Fernando Azpiroz, Francisco Guarner, Chaysavanh Manichanh
2016 Scientific Reports  
To date, meta-omic approaches use high-throughput sequencing technologies, which produce a huge amount of data, thus challenging modern computers. Here we present MetaTrans, an efficient open-source pipeline to analyze the structure and functions of active microbial communities using the power of multi-threading computers. The pipeline is designed to perform two types of RNA-Seq analyses: taxonomic and gene expression. It performs quality-control assessment, rRNA removal, maps reads against
more » ... tional databases and also handles differential gene expression analysis. Its efficacy was validated by analyzing data from synthetic mock communities, data from a previous study and data generated from twelve human fecal samples. Compared to an existing web application server, MetaTrans shows more efficiency in terms of runtime (around 2 hours per million of transcripts) and presents adapted tools to compare gene expression levels. It has been tested with a human gut microbiome database but also proposes an option to use a general database in order to analyze other ecosystems. For the installation and use of the pipeline, we provide a detailed guide at the following website (www.metatrans.org). In the last decade, next-generation sequencing technologies have allowed sequencing at a very low-cost and have thus boosted the use of meta-omic approaches to study microbial communities. To date, the main challenge is to develop, create, and optimize reliable tools that take advantage of current multi-threading computers to analyze the huge amount of data generated by high-throughput sequencing technologies. Over the last decade, the human microbiome has been the focus of important international consortia such as the Human Microbiome Project, a NIH initiative, and MetaHIT, a European consortium. These consortia have deposited catalogues of microbial genes in an unprecedented amount 1,2 . Metagenomics aims at cataloging the genes present in a sample, while the study of RNAs, called metatranscriptomics, provides an opportunity to gain insights into the functionality of microbial communities. By assessing the genes expressed by the microbial community, metatranscriptomics gives a mechanistic understanding of inter-community relationships and the crosstalk between a microbial community and its host 3,4 . Previous transcriptomic 5 and metatranscriptomic 6-8 studies developed various approaches to analyze RNA-Seq experiments; however, the particularity of distinct experimental methods hinders the development of a generic pipeline that covers all possible scenarios. It is important that such tools be not only flexible and adaptable but also efficient, both in terms of runtime and memory footprint. Here we present a downloadable, open-source, effective and efficient metatranscriptomic pipeline developed for a paired-end RNA-Seq analysis and easily adaptable to other high-throughput experiments. Given the rapid emergence of research into metatranscriptomics, additional bioinformatics tools are likely to be developed for specific tasks in the near future and will probably serve to improve our pipeline. Thus, we designed the pipeline in order to facilitate the inclusion of such third-party tools in each of its stages. Our pipeline was designed to perform two types of RNA-Seq analyses, namely those addressing 16S rRNA taxonomy and gene expression. To test the present metatranscriptomic pipeline, we analyzed synthetic mock communities, twelve fecal samples collected from eight individuals obtained from a previous study 9 and from an unpublished one. For four individuals, stool samples were collected and intestinal volume of gas was measured before and after a flatulogenic diet challenge of three days. In the present study, we believe that combining 16S rDNA, 16S rRNA and mRNA data can provide a new perspective of the factors involved in the origin of flatulence. Using these samples, we extracted total RNA, performed an rRNA removal step in a set of four samples
doi:10.1038/srep26447 pmid:27211518 pmcid:PMC4876386 fatcat:bdqoscfvgjatdharwhwshe7idq