Challenges and approaches for distributed workflow-driven analysis of large-scale biological data

Ilkay Altintas, Jianwu Wang, Daniel Crawl, Weizhong Li
2012 Proceedings of the 2012 Joint EDBT/ICDT Workshops on - EDBT-ICDT '12  
Next-generation DNA sequencing machines are generating a very large amount of sequence data with applications in many scientific challenges and placing unprecedented demands on traditional single-processor bioinformatics algorithms. Middleware and technologies for scientific workflows and data-intensive computing promise new capabilities to enable rapid analysis of next-generation sequence data. Based on this motivation and our previous experiences in bioinformatics and distributed scientific
more » ... rkflows, we are creating a Kepler Scientific Workflow System module, called "bioKepler", that facilitates the development of Kepler workflows for integrated execution of bioinformatics applications in distributed environments. This vision paper discusses the challenges related to next-generation sequencing data, explains the approaches taken in bioKepler to help with analysis of such data, and presents preliminary results demonstrating these approaches.
doi:10.1145/2320765.2320791 dblp:conf/edbt/AltintasWCL12 fatcat:lot2dlhp4fh45izbyqdiw3ta2y