Bio and health informatics meets cloud : BioVLab as an example

Heejoon Chae, Inuk Jung, Hyungro Lee, Suresh Marru, Seong-Whan Lee, Sun Kim
2013 Health Information Science and Systems  
The exponential increase of genomic data brought by the advent of the next or the third generation sequencing (NGS) technologies and the dramatic drop in sequencing cost have driven biological and medical sciences to data-driven sciences. This revolutionary paradigm shift comes with challenges in terms of data transfer, storage, computation, and analysis of big bio/medical data. Cloud computing is a service model sharing a pool of configurable resources, which is a suitable workbench to address
more » ... these challenges. From the medical or biological perspective, providing computing power and storage is the most attractive feature of cloud computing in handling the ever increasing biological data. As data increases in size, many research organizations start to experience the lack of computing power, which becomes a major hurdle in achieving research goals. In this paper, we review the features of publically available bio and health cloud systems in terms of graphical user interface, external data integration, security and extensibility of features. We then discuss about issues and limitations of current cloud systems and conclude with suggestion of a biological cloud environment concept, which can be defined as a total workbench environment assembling computational tools and databases for analyzing bio/medical big data in particular application domains. Review High throughput, massive parallel sequencing technologies, called the next generation sequencing (NGS) technologies, were first introduced in the late 1990s. The leap of sequencing technology by NGS dropped sequencing cost dramatically and caused the exponentially increasing volume of data. Comparing NGS to the traditional Sanger sequencing technology that was used for the first human genome project, a three billion dollar project over a span of 10 years, the cost and time for sequencing a single human genome has dropped by 3,000,000 and 3000 times respectively[1,2]. As a result, the nucleotide sequence resource in the EMBL-Bank DB has approximately doubled over the past 5 years [3, 4] . The sequencing throughput rate has increased five fold per year, whereas computer performance generally follows the Moore's Law, doubling only every 18 or 24 months. Thus the gap
doi:10.1186/2047-2501-1-6 pmid:25825658 pmcid:PMC4336112 fatcat:k3s3fhlv4jcfzdq7nexnpex46u