Improving MapReduce Performance through Process Migration

Rahul R. Ghule, Sachine N. Deshmukh
2015 International Journal of Engineering Research and  
MapReduce is widely used and popular programming model for huge amount of data processing. Hadoop is open source implementation of MapReduce framework. Hadoop MapReduce is used for large data processing. It computes large amount of data in less time. The Performance of Hadoop depends some of the metrics like job execution time and cluster throughput. In MapReduce, Job is divided into multiple map and reduce tasks. A node in Hadoop Cluster is supposed to perform multiple processes. Some process
more » ... an be executed slowly due to internal or external reasons. Because of this slow process job execution time is prolonged which leads to degradation of Hadoop MapReduce's performance. To overcome this, various strategies has been proposed like speculative execution, scheduling etc. In Speculative execution, each slow task is backed up other node in order to reduce the job execution time. These slow tasks can be called as straggler tasks. However, current strategies do not take node's health in consideration. A node in cluster may be straggler rather than process. If a node becomes straggler then it will lead to poor performance of current MapReduce process. Our aim is to find performance of a node by calculating CPU load. If it is greater than threshold value then node is considered to be straggler node. So, current process on that node is backed up on other node for faster execution. Experiments results shows that our system improver MapReduce performance by 12.91%.
doi:10.17577/ijertv4is070392 fatcat:nwmrbbkz5nhihib77skbsswk7u