Bio-Node – Bioinformatics in the Cloud [article]

Yannick Spreen, Maximilian Miller
2020 bioRxiv   pre-print
Motivation: The applicability and reproducibility of bioinformatics methods and results often depend on the structure and software architecture of their development. Exponentially growing data sets require ever more optimization and performance with conventional computing capacities lacking this process. This creates a large overhead for software development in a research area which is primarily interested in solving complex biological problems rather than developing new, performant software
more » ... utions. In pure computer science, new structures in the field of web development have produced more efficient processes for container-based software solutions. The advantages of these structures have rarely been explored in a broader scientific scale. This is also the case with the trend of migrating computations from on premise resources to the cloud. Results: We created Bio-Node, a new platform for large scale bio data analysis utilizing cloud compute resources (publicly available at https://bio-node.de). Bio-Node enables building complex workflows using a sophisticated web interface. We applied Bio-Node to implement bioinformatic workflows for rapid metagenome function annotation. We further developed "Auto-Clustering", a workflow that automatically extracts the most suited clustering parameters for specific data types and subsequently enables to optimally segregate unknown samples of the same type. Compared to existing methods and approaches Bio-Node improves performance and costs of bioinformatics data analyses while providing an easier and faster development process with focus on reproducibility and reusability.
doi:10.1101/2020.04.15.043596 fatcat:egn2d5uc2bfshdxg4dj2lm7epa