A Topic Community Detection Method for Information Network based on Improved Label Propagation
International Journal of Hybrid Information Technology
A large number of emerging information networks brings new challenges to the community detection. The meaningful community should be topic-oriented. However, the topology-based methods only reflect the strength of connection, and ignore the consistency of the topics; the content-based methods focus on the contents and completely ignore the links. This paper explores a topic oriented community detection method simLPA based on label propagation for information work. The method utilizes Latent
... utilizes Latent Dirichlet Allocation topic model to represent the node content, and calculate the content similarity by the normalized Kullback-Leibler divergence. simLPA extended by LabelRank fuses the links and the contents naturally to detect the topic community. Extensive experiments on nine real-world datasets with varying sizes and characteristics validate the proposed method outperforms other baseline algorithms in quality. Additionally simLPA integrated into the content is equivalent to LabelRank in efficiency, which is easy to handle large-scale information networks. network. In order to reduce the dimension of the node's content attributes, we apply the LDA topic model to represent the node content. Our approach fuses the links and the contents naturally in the process of normalizing the label propagation probability defined by content similarity between node and its neighbors. We adopt the modularity and the purity to evaluate the quality of the topic community. Through extensive experiments on real-world datasets drawn from WebKB, Cora and Wikipedia, we demonstrate the effectiveness and efficiency of our method. We find that simLPA often detects topic communities of comparable or superior quality on most these datasets. This paper is organized as follows: Section 2 discusses the related work; Section 3 presents the implementation details of simLPA; Section 4 introduces the datasets and reports quantitative experiments results and Section 5 concludes the findings and identifies the future research.