Approximating Personalized PageRank with Minimal Use of Web Graph Data

David Gleich, Marzia Polito
2006 Internet Mathematics  
In this paper, we consider the problem of calculating fast and accurate approximations to the personalized PageRank score ([8, 16]) of a webpage. We focus on techniques to improve speed by limiting the amount of webgraph data we need to access. PageRank scores are mainly used for ranking purposes, and generally only the scores exceeding a given threshold are relevant. In practice, and relative to the size of the web, only a small number of pages have a non-negligible personalized PageRank
more » ... We capitalize on this property of the PageRank score to reduce the amount of webgraph data needed for computation. Our algorithms provide both the approximation to the personalized PageRank score as well as guidance in using only the necessary information -and therefore sensibly reduce not only the computational cost of the algorithm, but also the memory and memory bandwidth requirements. Our algorithms assume that we have random access to a sparse representation of a webgraph (perhaps by crawling the web if necessary); some of them further require knowledge of a hostgraph ([19]). All of them provide a fast approximate calculation of the personalized PageRank score for pages where the score exceeds a given threshold. We report experiments with these algorithms on webgraphs of up to 118 million pages and prove theoretical approximation bound for all. We conclude by proposing an application scenario of personalized Web Search that inspired and motivated our work. * Work performed during a summer internship with Intel Corporation.
doi:10.1080/15427951.2006.10129128 fatcat:upj2qb3z35cp7gqgw74ndjyxny