Portable Parallel Programming on Cloud and HPC: Scientific Applications of Twister4Azure

T. Gunarathne, Bingjing Zhang, Tak-Lon Wu, J. Qiu
2011 2011 Fourth IEEE International Conference on Utility and Cloud Computing  
Recent advancements in data intensive computing for science discovery are fueling a dramatic growth in use of data-intensive iterative computations. The utility computing model introduced by cloud computing combined with the rich set of cloud infrastructure services offers a very attractive environment for scientists to perform such data intensive computations. The challenges to large scale distributed computations on clouds demand new computation frameworks that are specifically tailored for
more » ... oud characteristics in order to easily and effectively harness the power of clouds. Twister4Azure is a distributed decentralized iterative MapReduce runtime for Windows Azure Cloud. It extends the familiar, easy-to-use MapReduce programming model with iterative extensions, enabling a wide array of large-scale iterative data analysis for scientific applications on Azure cloud. This paper presents the applicability of Twister4Azure with highlighted features of fault-tolerance, efficiency and simplicity. We study three dataintensive applications − two iterative scientific applications, Multi-Dimensional Scaling and KMeans Clustering; one dataintensive pleasingly parallel scientific application, BLAST+ sequence searching. Performance measurements show comparable or a factor of 2 to 4 better results than the traditional MapReduce runtimes deployed on up to 256 instances and for jobs with tens of thousands of tasks.
doi:10.1109/ucc.2011.23 dblp:conf/ucc/GunarathneZWQ11 fatcat:2vf5gmxmvncufmxkptsoqvkhmi