CooMR

Xiaobing Li, Yandong Wang, Yizheng Jiao, Cong Xu, Weikuan Yu
2013 Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '13  
Hadoop is a widely adopted open source implementation of MapReduce programming model for big data processing. It represents system resources as available map and reduce slots and assigns them to various tasks. This execution model gives little regard to the need of cross-task coordination on the use of shared system resources on a compute node, which results in task interference. In addition, the existing Hadoop merge algorithm can cause excessive I/O. In this study, we undertake an effort to
more » ... dress both issues. Accordingly, we have designed a cross-task coordination framework called CooMR for efficient data management in MapReduce programs. CooMR consists of three component schemes including cross-task opportunistic memory sharing and log-structured I/O consolidation, which are designed to facilitate task coordination, and the keybased in-situ merge (KISM) algorithm which is designed to enable the sorting/merging of Hadoop intermediate data without actually moving the pairs. Our evaluation demonstrates that CooMR is able to increase task coordination, improve system resource utilization, and significantly speed up the execution time of MapReduce programs.
doi:10.1145/2503210.2503276 dblp:conf/sc/LiWJXY13 fatcat:ngietg45fnc2jbehq67od2463m