MB-ToT: An Effective Model for Topic Mining in Microblogs

Shaopeng Liu, Jian Yin, Jia Ouyang, Yun Huang, Piyuan Lin
2014 Applied Mathematics & Information Sciences  
Topic mining on microblogging sites with sheer scale of instance messages and social network information, such as Twitter, is a hard and challenging problem. Although many text mining techniques and generative probabilistic models have been developed for static plain-text corpus, they are inclined to achieve unsatisfactory results in microblogs without considering that microblogs are temporally sequential and concerned with social network information. In this paper, we propose a novel topic
more » ... l, MicroBlog-Topics over Time (MB-ToT), which aims for comprehensive topic analysis in microblogs. Firstly, we assume each topic is a mixture distribution influenced by both word co-occurrences and timestamps of microblogs. This allows MB-ToT to capture the changes of each topic over time. Subsequently, we apply users' intrinsic interests, social contact relations and #hashtags to improve the topic mining result. Finally, we present a Gibbs sampling implementation for the inference of MB-ToT. We evaluate MB-ToT and compare it with the state-of-the-art methods on a real dataset. In our experiments, MB-ToT outperforms the state-of-the-art methods by a large margin in terms of both perplexity and KL-divergence. We also show that the quality of the generated latent topics of MB-ToT is promising.
doi:10.12785/amis/080137 fatcat:knw4rqmpozczpimrgb7oftso4a