i$^2$ MapReduce: Incremental MapReduce for Mining Evolving Big Data

Yanfeng Zhang, Shimin Chen, Qiang Wang, Ge Yu
2015 IEEE Transactions on Knowledge and Data Engineering  
As new data and updates are constantly arriving, the results of data mining applications become stale and obsolete over time. Incremental processing is a promising approach to refreshing mining results. It utilizes previously saved states to avoid the expense of re-computation from scratch. In this paper, we propose i 2 MapReduce, a novel incremental processing extension to MapReduce, the most widely used framework for mining big data. Compared with the state-of-the-art work on Incoop, i 2
more » ... on Incoop, i 2 MapReduce (i) performs key-value pair level incremental processing rather than task level re-computation, (ii) supports not only one-step computation but also more sophisticated iterative computation, which is widely used in data mining applications, and (iii) incorporates a set of novel techniques to reduce I/O overhead for accessing preserved fine-grain computation states. We evaluate i 2 MapReduce using a one-step algorithm and four iterative algorithms with diverse computation characteristics. Experimental results on Amazon EC2 show significant performance improvements of i 2 MapReduce compared to both plain and iterative MapReduce performing re-computation.
doi:10.1109/tkde.2015.2397438 fatcat:pb7itxwyq5gzte5ir7ah6bjmju