Improving Performance in Hadoop Using Automatic and Predictive Configuration

Juan Fang, Hao Sun, Li-Fu Zhou, Xing-Tian Ren, Min Cai
2016 Proceedings of the 2016 International Conference on Computer Engineering and Information Systems   unpublished
MapReduce is an effective programming model for analyzing large-scale data. Hadoop-a distributed processing system is widely used nowadays. Improving the task parallelism can be a key point to improve the MapReduce performance in Hadoop. In this paper, we address the problem in two ways. On the one hand we can run the tasks with some dynamic configurations. On the other hand, considering of the difference of tasktracker we use mathematics method to predict the cups' utilization of tasktracker
more » ... on of tasktracker to assign the task. Experimental results on both ways show we can improve the performance in Hadoop by improving the task parallelism.
doi:10.2991/ceis-16.2016.54 fatcat:7hh6sbri45cltn43hymkvexjxm