BioSAILs: versatile workflow management for high-throughput data analysis
AbstractMotivationHigh-throughput analysis in the current era of systems biology encompasses a range of analysis workflows for different applications, such as transcriptomics, epigenetics, variant discovery, de novo genome assembly, etc. Many research institutes house a genomics core facility that is responsible for generating high-throughput sequencing data and often includes a core bioinformatics team that carries out data analysis. Core teams must both keep track of data and maintain
... software packages and databases – which must be scalable according to ever-changing research needs – and they must allocate computational resources to them. Typical data analysis pipelines involve multiple software packages and require the ability to install, execute, and maintain software stacks on a variety of hardware/operating system configurations, including high-performance computing (HPC) facilities, stand-alone servers, and Cloud services. At the same time, individual researchers need to analyze their data and share their analysis steps with collaborators. Having to rely on ad hoc scripts and rigid internally developed pipelines is inefficient, difficult to track and maintain, and ultimately, limits the ability to adapt and evolve as research methods progress.ResultsHere, we present BioSAILs (Bioinformatics Standardized Analysis Information Layers), a scientific workflow management system (WMS) developed by the Core Bioinformatics team at NYU Abu Dhabi. BioSAILs comprises two central components, BioX command and HPCRunner command, supported by BioStacks software stacks. BioX structures executable workflows that may be either run directly on a desktop or lab server, or submitted by HPCRunner to HPC infrastructure or the Cloud. BioSAILs is supported by a range of pre-configured, customizable BioStacks software stacks for various applications (e.g. RNA-seq, de novo genome/transcriptome assembly, variant discovery, etc.) Software stacks are built using Conda and BioConda (6), have been pre-packaged into modules using EasyBuild, and can be deployed with or without BioSAILs. Within the BioSAILs WMS, users can also invoke commands that take advantage of containerized software in Docker (10) and Singularity (8). Finally, the BioSAILs web resource provides documentation, blog posts related to various BioSAILs analysis tasks, forums, knowledge base & FAQs, as well as an interactive web-based workflow editor/creator. BioSAILs is production-level software that has been in use as the main WMS at NYU Abu Dhabi for the past 2 years and is continuously developed and maintained by the core bioinformatics team.AvailabilityBioSAILs is open source software, available under the GNU license. Online documentation, support and guides can be found at biosails.abudhabi.nyu.edu. The public Github project is available at Github – BioSAILs. HPC-Runner and BioX-Workflow can be installed through Conda using the BioConda channel.