The Sinkhorn–Knopp Algorithm: Convergence and Applications

Philip A. Knight
2008 SIAM Journal on Matrix Analysis and Applications  
As long as a square nonnegative matrix A contains sufficient nonzero elements, then the Sinkhorn-Knopp algorithm can be used to balance the matrix, that is, to find a diagonal scaling of A that is doubly stochastic. It is known that the convergence is linear and an upper bound has been given for the rate of convergence for positive matrices. In this paper we give an explicit expression for the rate of convergence for fully indecomposable matrices. We describe how balancing algorithms can be
more » ... to give a measure of web page significance. We compare the measure with some well known alternatives, including PageRank. We show that with an appropriate modification, the Sinkhorn-Knopp algorithm is a natural candidate for computing the measure on enormous data sets. Key words. Matrix balancing, Sinkhorn-Knopp algorithm, PageRank, doubly stochastic matrix. AMS subject classifications. 15A48, 15A51, 65F15, 65F35. Introduction. If a graph has the appropriate structure, we can generate a random walk on it by taking its connectivity matrix and applying a suitable scaling to transform it into a stochastic matrix. This simple idea has a wide range of applications. In particular, we can rank pages on the internet by generating the appropriate connectivity matrix, G, and applying a scaling induced by a diagonal matrix, D, of column sums so that P c = GD −1 is column stochastic. 1 Ordering pages according to the size of the components in the stationary distribution of P c gives us a ranking. Roughly speaking, this is how Google's PageRank is derived. An alternative method of generating a random walk on G is to apply a diagonal scaling to both sides of G to form a doubly stochastic matrix P = DGE. Of course, if we use this approach then the stationary distribution is absolutely useless for ranking purposes. However, in §5 we argue that the entries of D and E can be used as alternative measures. We will also see that if we apply the Sinkhorn-Knopp (SK) algorithm on an appropriate matrix to find D and E, we can compute our new ranking with a cost comparable to that of finding the PageRank. In order to justify this conclusion, we need to establish the rate of convergence of the SK algorithm, which we *
doi:10.1137/060659624 fatcat:sr3573bqibet5exazrwmv354mi