Congkai Sun, Bin Gao, Zhenfu Cao, Hang Li
2008 Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP '08   unpublished
Previously topic models such as PLSI (Probabilistic Latent Semantic Indexing) and LDA (Latent Dirichlet Allocation) were developed for modeling the contents of plain texts. Recently, topic models for processing hypertexts such as web pages were also proposed. The proposed hypertext models are generative models giving rise to both words and hyperlinks. This paper points out that to better represent the contents of hypertexts it is more essential to assume that the hyperlinks are fixed and to
more » ... re fixed and to define the topic model as that of generating words only. The paper then proposes a new topic model for hypertext processing, referred to as Hypertext Topic Model (HTM).
doi:10.3115/1613715.1613779 fatcat:ivvncihujzfb3jlfv3h6kpbab4