Efficient pagerank approximation via graph aggregation

Andrei Z. Broder, Ronny Lempel, Farzin Maghoul, Jan Pedersen
2004 Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters - WWW Alt. '04  
We present a framework for approximating random-walk based probability distributions over Web pages using graph aggregation. The basic idea is to partition the graph into classes of quasi-equivalent vertices, to project the page-based random walk to be approximated onto those classes, and to compute the stationary probability distribution of the resulting class-based random walk. From this distribution we can quickly reconstruct a distribution on pages. In particular, our framework can
more » ... te the well-known PageRank distribution by setting the classes according to the set of pages on each Web host. We experimented on a Web-graph containing over 1.4 billion pages and over 6.6 billion links from a crawl of the Web conducted by AltaVista in September 2003. We were able to produce a ranking that has Spearman rank-order correlation of 0.95 with respect to PageRank. The clock time required by a simplistic implementation of our method was less than half the time required by a highly optimized implementation of PageRank, implying that larger speedup factors are probably possible. Keywords Web IR · Citation and link analysis * Significant portions of the work presented here were done while A. Broder and R. Lempel were employed by the AltaVista corporation.
doi:10.1145/1013367.1013537 dblp:conf/www/BroderLMP04 fatcat:srdwtq65g5enblxefzllvctasy