Peer Review #1 of "Compact graphical representation of phylogenetic data and metadata with GraPhlAn (v0.1)"
[peer_review]
2015
unpublished
The increased availability of genomic and metagenomic data poses challenges at multiple analysis levels, including visualization of very large-scale microbial and microbial community data paired with rich metadata. We developed GraPhlAn (Graphical Phylogenetic Analysis), a computational tool that produces high-quality, compact visualizations of microbial genomes and metagenomes. This includes phylogenies spanning up to thousands of taxa, annotated with metadata ranging from microbial community
more »
... bundances to microbial physiology or host and environmental phenotypes. GraPhlAn has been developed as an open-source command-driven tool in order to be easily integrated into complex, publication-quality bioinformatics pipelines. It can be executed either locally or through an online Galaxy web application. We present several examples including taxonomic and phylogenetic visualization of microbial communities, metabolic functions, and biomarker discovery that illustrate GraPhlAn's potential for modern microbial and community genomics. PeerJ reviewing PDF | (Abstract 11 The increased availability of genomic and metagenomic data poses challenges at multiple 12 analysis levels, including visualization of very large-scale microbial and microbial 13 community data paired with rich metadata. We developed GraPhlAn (Graphical 14 Phylogenetic Analysis), a computational tool that produces high-quality, compact 15 visualizations of microbial genomes and metagenomes. This includes phylogenies spanning 16 up to thousands of taxa, annotated with metadata ranging from microbial community 17 abundances to microbial physiology or host and environmental phenotypes. GraPhlAn has 18 been developed as an open-source command-driven tool in order to be easily integrated 19 into complex, publication-quality bioinformatics pipelines. It can be executed either locally 20 or through an online Galaxy web application. We present several examples including 21 taxonomic and phylogenetic visualization of microbial communities, metabolic functions, 22 and biomarker discovery that illustrate GraPhlAn's potential for modern microbial and 23 community genomics. 24 Introduction 25 26 Modern high-throughput sequencing technologies provide comprehensive, large-scale 27 datasets that have enabled a variety of novel genomic and metagenomic studies. A large 28 number of statistical and computational tools have been developed specifically to tackle the 29 complexity and high-dimensionality of such datasets and to provide robust and 30 interpretable results. Visualizing data including thousands of microbial genomes or 31 metagenomes, however, remains a challenging task that is often crucial to driving 32 exploratory data mining and to compactly summarizing quantitative conclusions. 33 34 In the specific context of microbial genomics and metagenomics, next-generation 35 sequencing in particular produces datasets of unprecedented size, including thousands of 36 newly sequenced microbial genomes per month and a tremendous increase in genetic 37 diversity sampled by isolates or culture-free assays. Displaying phylogenies with thousands 38 of microbial taxa in hundreds of samples is infeasible with most available tools. This is 39 especially true when sequencing profiles need to be placed in the context of sample PeerJ reviewing PDF | (2015:03:4403:1:0:NEW 22 May 2015) Reviewing Manuscript 40 metadata (e.g. clinical information). Among recently developed tools, iTOL (Letunic & Bork 41 2007; Letunic & Bork 2011) targets interactive analyses of large-scale phylogenies with a 42 moderate amount of overlaid metadata, whereas ETE (Huerta-Cepas et al. 2010) is a 43 Python programming toolkit focusing on tree exploration and visualization that is targeted 44 for scientific programmers, and Krona (Ondov et al. 2011) emphasizes hierarchical 45 quantitative information typically derived from metagenomic taxonomic profiles. Neither 46 of these tools provides an automatable environment for non-computationally expert users 47 in which very large phylogenies can be combined with high-dimensional metadata such as 48 microbial community abundances, host or environmental phenotypes, or microbial 49 physiological properties. 50 51 In particular, a successful high-throughput genomic visualization environment for modern 52 microbial informatics must satisfy two criteria. First, software releases must be free and 53 open-source to allow other researchers to verify and to adapt the software to their specific 54 needs and to cope with the quick evolution of data types and datasets size. Second, 55 visualization tools must be command-driven in order to be embedded in computational 56 pipelines. This allows for a higher degree of analysis reproducibility, but the software must 57 correspondingly be available for local installation and callable through a convenient 58 interface (e.g. API or general scripting language). Local installations have also the 59 advantage of avoiding the transfer of large or sensitive data to remote servers, preventing 60 potential issues with the confidentiality of unpublished biological data. Neither of these 61 criteria, of course, prevent tools from also being embeddable in web-based interfaces in 62 order to facilitate use by users with limited computational expertise (Blankenberg et al. 63 2010; Giardine et al. 2005; Goecks et al. 2010; Oinn et al. 2004), and all such tools must 64 regardless produce informative, clear, detailed, and publication-ready visualizations. Materials & Methods 66 GraPhlAn is a new tool for compact and publication-quality representation of circular 67 taxonomic and phylogenetic trees with potentially rich sets of associated metadata. It was 68 developed primarily for microbial genomic and microbiome-related studies in which the 69 complex phylogenetic/taxonomic structure of microbial communities needs to be 70 complemented with quantitative and qualitative sample-associated metadata. GraPhlAn is 71 available at http://cibiocm.bitbucket.org/tools/graphlan.html. 72 Implementation strategy 73 GraPhlAn is composed by two Python modules: one for drawing the image and one for 74 adding annotations to the tree. GraPhlAn exploits the annotation file to highlight and 75 personalize the appearance of the tree and of the associated information. The annotation 76 file does not perform any modifications to the structure of the tree, but it just changes the 77 way in which nodes and branches are displayed. Internally, GraPhlAn uses the matplotlib 78 library (Hunter 2007) to perform the drawing functions. 79 The export2graphlan module 80 Export2graphlan is a framework to easily integrate GraPhlAn into already existing 81 bioinformatics pipelines. Export2graphlan makes use of two external libraries: the pandas PeerJ reviewing PDF | (
doi:10.7287/peerj.1029v0.1/reviews/1
fatcat:6d4j3inno5e2bmmkecldatitt4