Cracking Down MapReduce Failure Amplification through Analytics Logging and Migration

Yandong Wang, Huansong Fu, Weikuan Yu
<span title="">2015</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/t3x4vqewrncrfgn2wu7cafsbsq" style="color: black;">2015 IEEE International Parallel and Distributed Processing Symposium</a> </i> &nbsp;
MapReduce is popular for big data analytics because it offers easy-to-use map and reduce user interfaces while hiding the complexity of system scalability and fault resiliency issues. While a large body of literature has focused on improving the performance and scalability of MapReduce, the issue of fault resiliency has thus far received little attention. In this paper, we take on an effort to investigate the fault resiliency of MapReduce using YARN (the next-generation Hadoop) as a case study.
more &raquo; ... We reveal that the failures of a MapTask, a ReduceTask or a compute node can cause distinctly different impact to MapReduce programs. Particularly, YARN MapReduce is not able to gracefully handle failures that involve ReduceTasks, causing prolonged task execution, delayed job completion, and, more severely, failure amplifications due to the cascading effects to other tasks. These problems together cause the performance collapse of MapReduce jobs. In this paper, we introduce a new faulttolerant framework that can crack down failure amplification and gracefully handle failure scenarios. It is designed with two key fault handling techniques: analytics logging and speculative fast migration. Analytics logging is a light-weight mechanism that logs the key progress information of MapReduce tasks; speculative fast migration handles node failures by proactively re-executing MapTasks, migrating ReduceTasks, and collective merging with a pipeline of shuffle/merge and reduce stages. Our performance evaluation demonstrates that these techniques can eliminate failure amplification and deliver fast job execution compared to the existing task re-execution mechanism in MapReduce.
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/ipdps.2015.111">doi:10.1109/ipdps.2015.111</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/ipps/WangFY15.html">dblp:conf/ipps/WangFY15</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hoo7elreo5d3hntvev76nhuemi">fatcat:hoo7elreo5d3hntvev76nhuemi</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170722004510/http://www.cs.fsu.edu:80/~yuw/pubs/2015-IPDPS-Yu.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/66/19/6619000fef3356996cbbab18053820a0ad29466d.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/ipdps.2015.111"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>