OUP accepted manuscript

2017 Briefings in Bioinformatics  
Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure,
more » ... re the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and highthroughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with Md. Rezaul Karim is a PhD researcher at Semantics in eHealth and Life Sciences with expertise in the development of computational resources for the analysis and visualization of ribosome profiling (RiboSeq) and high-throughput gene expression data. She is the coordinator of RiboSeq.Org (http//riboseq.org/). Achille Zappa is a Postdoctoral researcher
doi:10.1093/bib/bbx039 pmid:28419324 pmcid:PMC6169675 fatcat:xsdfepwqmnb7pgkmnep2ifn2wi