RASTtk: A modular and extensible implementation of the RAST algorithm for building custom annotation pipelines and annotating batches of genomes

Thomas Brettin, James J. Davis, Terry Disz, Robert A. Edwards, Svetlana Gerdes, Gary J. Olsen, Robert Olson, Ross Overbeek, Bruce Parrello, Gordon D. Pusch, Maulik Shukla, James A. Thomason (+4 others)
2015 Scientific Reports  
The RAST (Rapid Annotation using Subsystem Technology) annotation engine was built in 2008 to annotate bacterial and archaeal genomes. It works by offering a standard software pipeline for identifying genomic features (i.e., protein-encoding genes and RNA) and annotating their functions. Recently, in order to make RAST a more useful research tool and to keep pace with advancements in bioinformatics, it has become desirable to build a version of RAST that is both customizable and extensible. In
more » ... his paper, we describe the RAST tool kit (RASTtk), a modular version of RAST that enables researchers to build custom annotation pipelines. RASTtk offers a choice of software for identifying and annotating genomic features as well as the ability to add custom features to an annotation job. RASTtk also accommodates the batch submission of genomes and the ability to customize annotation protocols for batch submissions. This is the first major software restructuring of RAST since its inception. T he last two decades of research have brought vast changes to the field of genomics. The sequencing of a genome and the subsequent annotation of gene functions that were originally performed by teams of researchers expending thousands of man-hours of labor has become a standard laboratory technique that can be performed by a single person in one day. As sequencing technology has advanced and the cost has dropped, the number of genomes being deposited into the public databases has outpaced Moore's Law 1,2 . This has shifted the bottlenecks in genomic analysis from the sequencing per se to the tools that are used for annotation and genomic analysis. In 2008, the RAST server (Rapid Annotation using Subsystem Technology) was developed to annotate microbial genomes 3,4 . It works by projecting manually curated gene annotations from the SEED database onto newly submitted genomes 5-7 . The key to the consistency and accuracy of the RAST algorithm has been the carefully structured annotation data in the SEED, which are organized into subsystems (sets of logically related functional roles) 5 . As a result, RAST has become one of the most popular sources for consistent and accurate annotations for microbial genomes. The RAST community currently consists of ,10,000 active users who have contributed an average of 1,170 microbial genomes per week in the last year. It is also being used as the foundation for maintaining consistency for automated metabolic modeling in the ModelSEED 8 and KBase (kbase.us), and for comparative genomics in the bacterial pathogen database, PATRIC 9,10 . RAST and other annotation engines encapsulate software for identifying and annotating specific genomic features into a standard annotation pipeline [11] [12] [13] [14] [15] [16] . This approach has several advantages including offering speed, convenience and consistency to the user. In order to annotate with RAST, users submit their contigs to the server where the computation is performed. This frees users from having to download and install multiple programs, or to perform intensive computations. However, despite these advantages, this approach also has limitations. For OPEN SUBJECT AREAS: COMPARATIVE GENOMICS BIOINFORMATICS
doi:10.1038/srep08365 pmid:25666585 pmcid:PMC4322359 fatcat:6wmo7ui4izfuxf5wqbykx7lfke