Cloud computing paradigms for pleasingly parallel biomedical applications

Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox
2010 Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing - HPDC '10  
Cloud computing offers exciting new approaches for scientific computing that leverages the hardware and software investments on large scale data centers by major commercial players. Loosely coupled problems are very important in many scientific fields and are on the rise with the ongoing move towards data intensive computing. There exist several approaches to leverage clouds & cloud oriented data processing frameworks to perform pleasingly parallel computations. In this paper we present two
more » ... singly parallel biomedical applications, 1) assembly of genome fragments 2) dimension reduction in the analysis of chemical structures, implemented utilizing cloud infrastructure service based utility computing models of Amazon AWS and Microsoft Windows Azure as well as utilizing MapReduce based data processing frameworks, Apache Hadoop and Microsoft DryadLINQ. We review and compare each of the frameworks and perform a comparative study among them based on performance, efficiency, cost and the usability. Cloud service based utility computing model and the managed parallelism (MapReduce) exhibited comparable performance and efficiencies for the applications we considered. We analyze the variations in cost between the different platform choices (eg: EC2 instance types), highlighting the need to select the appropriate platform based on the nature of the computation.
doi:10.1145/1851476.1851544 dblp:conf/hpdc/GunarathneWQF10 fatcat:6fzi7lxe45cmdkg3elb2skrrjm