Search Result Diversification in Short Text Streams

Shangsong Liang, Emine Yilmaz, Hong Shen, Maarten De Rijke, W. Bruce Croft
2017 ACM Transactions on Information Systems  
We consider the problem of search result diversification for streams of short texts. Diversifying search results in short text streams is more challenging than in the case of long documents, as it is difficult to capture the latent topics of short documents. To capture the changes of topics and the probabilities of documents for a given query at a specific time in a short text stream, we propose a dynamic Dirichlet multinomial mixture topic model, called D2M3, as well as a Gibbs sampling
more » ... hm for the inference. We also propose a streaming diversification algorithm, SDA, that integrates the information captured by D2M3 with our proposed modified version of the PM-2 (Proportionality-based diversification Methodsecond version) diversification algorithm. We conduct experiments on a Twitter dataset and find that SDA statistically significantly outperforms state-of-the-art non-streaming retrieval methods, plain streaming retrieval methods, as well as streaming diversification methods that use other dynamic topic models.
doi:10.1145/3057282 fatcat:magvfcd3xrgmxh6g35ocu3c4ba