Performance Evaluation of MapReduce Applications on Cloud Computing Environment, FutureGrid [chapter]

Yunhee Kang, Geoffrey C. Fox
2011 Communications in Computer and Information Science  
This paper describes the result of performance evaluation of two kinds of MapReduce applications running in the FutureGrid: a data intensive application and a computation intensive application. For this work, we construct a virtualized cluster system made of a set of VM instances. We observe that the overall performance of a data intensive application is strongly affected by the configuration of the VMs. It can be used to identify the bottleneck of the MapReduce application running on the
more » ... lized cluster system with various VM instances. Performance Evaluation of MapReduce applications on Cloud Computing Environment, FutureGrid 3 needs to be solved since only the map and reduce functions need to be implemented, and the framework takes care of computing the programmer has to deal with lowerlevel mechanisms to control the data flow [2,4]. Twister There are some existing implementations of MapReduce such as Hadoop [6] and Sphere [7] . Twister is one of MapReduce implementations, which is an enhanced MapReduce runtime with an extended programming model that supports an iterative MapReduce computing efficiently [8] . In addition it provides programming extensions to MapReduce with broadcast and scatter type for transferring data. These improvements allow Twister to support iterative MapReduce computations highly efficiently compared to other MapReduce runtimes. It reads data from local disks of the worker nodes and handles the intermediate data in distributed memory of the worker nodes. All communication and data transfers are performed via a pub/sub messaging system NaradaBrokering that is an open-source, distributed messaging infrastructure [9] . Twister uses a publish/subscribe messaging infrastructure to handle four types of communication needs; (i) sending/receiving control events, (ii) send data from the client side driver to the Twister daemons, (iii) intermediate data transfer between map and reduce tasks, and (iv) send the outputs of the reduce tasks back to the client side driver to invoke the combine operation.
doi:10.1007/978-3-642-27180-9_10 fatcat:kd5u4rtgtjd73gec3nhizkvg7u