Efficient Pairwise Penetrating-rank Similarity Retrieval

Weiren Yu, Julie McCann, Chengyuan Zhang
2019 ACM Transactions on the Web  
Many web applications demand a measure of similarity between two entities, such as collaborative filtering, web document ranking, linkage prediction, and anomaly detection. P-Rank (Penetrating-Rank) has been accepted as a promising graph-based similarity measure as it provides a comprehensive way of encoding both incoming and outgoing links into assessment. However, the existing method to compute P-Rank is iterative in nature and rather cost-inhibitive. Moreover, the accuracy estimate and
more » ... ity issues for P-Rank computation have not been addressed. In this paper, we consider the optimization techniques for P-Rank search that encompasses its accuracy, stability and computational efficiency. (1) The accuracy estimation is provided for P-Rank iterations, with the aim to find out the number of iterations, k, required to guarantee a desired accuracy. (2) A rigorous bound on the condition number of P-Rank is obtained for stability analysis. Based on this bound, it can be shown that P-Rank is stable and well-conditioned when the damping factors are chosen to be suitably small. (3) Two matrix-based algorithms, applicable to digraphs and undirected graphs, are respectively devised for efficient P-Rank computation, which improves the computational time from O(kn 3 ) to O(υn 2 +υ 6 ) for digraphs, and to O(υn 2 ) for undirected graphs, where n is the number of vertices in the graph, and υ (≪ n) is the target rank of the graph. Moreover, our proposed algorithms can significantly reduce the memory space of P-Rank computations from O(n 2 ) to O(υn + υ 4 ) for digraphs, and to O(υn) for undirected graphs, respectively. Finally, extensive experiments on real-world and synthetic datasets demonstrate the usefulness and efficiency of the proposed techniques for P-Rank similarity assessment on various networks. Yu et al. such settings of (C in ,C out ) will prevent P-Rank from achieving good stability. Thus, the results in this section motivate us to study another non-iterative model (shown in Section 5) that does not produce iterative errors relying on (C in , C out ) for P-Rank computation. P2) We introduce the notion of P-Rank condition number κ ∞ to analyze the stability of P-Rank (Section 4). We develop a new eigenvector-based approach to obtain a tight bound for κ ∞ , and provide the conditions under which P-Rank is stable, that is, slight perturbations in the link structure will not cause large changes in the P-Rank similarity. We provide a real application to show how to use P-Rank condition number to set appropriate hyper-parameters for P-Rank which can improve the robustness and accuracy of the decentralized computing of P-Rank. P3) We propose two novel matrix-based algorithms (DE P-Rank and UN P-Rank) 1 that can substantially speed up the computation of P-Rank from O(kn 3 ) to O(υn 2 + υ 6 ) for digraphs, and to O(υn 2 ) for undirected graphs (Section 5) with guaranteed accuracy, where υ (≪ n) is the target rank of the graph. Besides, our proposed algorithms can significantly reduce the memory space of P-Rank computations from O(n 2 ) to O(υn + υ 4 ) for digraphs, and to O(υn) for undirected graphs, respectively. Our non-iterative algorithms in this section, unlike the iterative version in Section 3, does not produce iterative errors that hinge on (C in , C out ) settings. Thus, for our non-iterative model, by setting small values of (C in , C out ), we can achieve both high computational efficiency and good stability at the same time. We empirically verify the efficiency of our methods on real and synthetic data (Section 6). The experimental results show that (1) P-Rank converges exponentially w.r.t. the iteration number; (2) the stability of P-Rank is sensitive to different choices of the damping factors and the weighted factor; (3) the proposed DE P-Rank and UN P-Rank outperform its competitors by up to one order of magnitude, and scale well over large networks.
doi:10.1145/3368616 fatcat:wercjvjztra4vbcbk4n77scfcy