Stability and Similarity of Link Analysis Ranking Algorithms

Debora Donato, Stefano Leonardi, Panayiotis Tsaparas
2006 Internet Mathematics  
Recently, there has been a surge of research activity in the area of Link Analysis Ranking, where hyperlink structures are used to determine the relative authority of Web pages. One of the seminal works in this area is that of Kleinberg [15], who proposed the HITS algorithm. In this paper, we undertake a theoretical analysis of the properties of the HITS algorithm on a broad class of random graphs. Working within the framework of Borodin et al. [7] , we prove that on this class (a) the HITS
more » ... rithm is stable with high probability, and (b) the HITS algorithm is similar to the INDEGREE heuristic that assigns to each node weight proportional to the number of incoming links. We demonstrate that our results go through for the case that the expected in-degrees of the graph follow a power-law distribution, a situation observed in the actual Web graph [9] . We also study experimentally the similarity between HITS and INDEGREE, and we investigate the general conditions under which the two algorithms are similar. Borodin et al. [7] considered the question of stability and similarity over an unrestricted class of graphs. They studied a variety of algorithms, and they proved that no pair of these algorithms is similar, and almost all algorithms are unstable. It appears that the class of all possible graphs is too broad to allow for positive results. This raises naturally the question whether it is possible to prove positive results if we restrict ourselves to a smaller class of graphs. Since the explosion of the Web, various stochastic models have been proposed for the Web graph [4, 5, 16, 3] . The model we consider, which was proposed by Azar et al. [4], is the following: assume that every node i in the graph comes with two parameters a i and h i which take values in [0, 1]. For some node i, the value h i can be thought of as the probability of node i to be a good hub, while the value a i is the probability of the node i to be a good authority. We then generate an edge from i to j with probability proportional to h i a j . We will refer to this model as the product model, and the corresponding class of graphs as the class of product graphs. The product graph model generalizes the traditional random graph model of Erdös and Rèny [13] to include graphs where the expected degrees follow specific distributions. This is of particular interest since it is well known [16, 9] that the in-degrees of the nodes in the Web graph follow a power law distribution. Our contribution. In this paper we study the behavior of the HITS algorithm, proposed by Kleinberg [15], on the class of product graphs. The study of HITS on product graphs was initiated by Azar et al. [4] who showed that under some assumptions the HITS algorithm returns weights that are very close to the authority parameters. We formalize the findings of Azar et al. [4] in the framework of Borodin et al. [7] . We extend the definitions of stability and similarity for classes of random graphs, and we demonstrate the link between stability and similarity. We then prove that, with high probability, under some restrictive assumptions, the HITS algorithm is stable on the class of product graphs, and similar to the INDEGREE heuristic that ranks pages according to their indegree. This similarity result is the main contribution of the paper. The implication of the result is that on product graphs, with high probability, the HITS algorithm reduces to simple in-degree count. We show that our assumptions are general enough to capture graphs where the expected degrees follow a power law distribution as the one observed on the real Web. We also analyze the correlation between INDEGREE and HITS on a large sample of the Web graph. The experimental analysis reveals that similarity between HITS and INDEGREE can also be observed on the real Web. We conclude with a discussion on the conditions that guarantee similarity of HITS and INDEGREE for the class of all possible graphs.
doi:10.1080/15427951.2006.10129130 fatcat:xhrta2oxcvewnmug2es6bnyceu