PrIter: A Distributed Framework for Prioritizing Iterative Computations

Yanfeng Zhang, Qixin Gao, Lixin Gao, Cuirong Wang
2013 IEEE Transactions on Parallel and Distributed Systems  
Iterative computations are pervasive among data analysis applications, including Web search, online social network analysis, recommendation systems, and so on. These applications typically involve data sets of massive scale. Fast convergence of the iterative computations on the massive data set is essential for these applications. In this paper, we explore the opportunity for accelerating iterative computations by prioritization. Instead of performing computations on all data points without
more » ... rimination, we prioritize the computations that help convergence the most, so that the convergence speed of iterative process is significantly improved. We develop a distributed computing framework, PrIter, which supports the prioritized execution of iterative computations. PrIter either stores intermediate data in memory for fast convergence or stores intermediate data in files for scaling to larger data sets. We evaluate PrIter on a local cluster of machines as well as on Amazon EC2 Cloud. The results show that PrIter achieves up to 50x speedup over Hadoop for a series of iterative algorithms. In addition, PrIter is shown better performance for iterative computations than other state-of-the-art distributed frameworks such as Spark and Piccolo.
doi:10.1109/tpds.2012.272 fatcat:kugnwaenfncpxa33lr7ohwia74