Improving context-aware query classification via adaptive self-training

Minmin Chen, Jian-Tao Sun, Xiaochuan Ni, Yixin Chen
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
Topical classification of user queries is critical for generalpurpose web search systems. It is also a challenging task, due to the sparsity of query terms and the lack of labeled queries. On the other hand, search contexts embedded in query sessions and unlabeled queries free on the web have not been fully utilized in most query classification systems. In this work, we leverage these information to improve query classification accuracy. We first incorporate search contexts into our framework
more » ... ing a Conditional Random Field (CRF) model. Discriminative training of CRFs is favored over the traditional maximum likelihood training because of its robustness to noise. We then adapt self-training with our model to exploit the information in unlabeled queries. By investigating different confidence measurements and model selection strategies, we effectively avoid the error-reinforcing nature of self-training. In extensive experiments on real search logs, we have averaged around 20% improvement in classification accuracy over other state-of-the-art baselines.
doi:10.1145/2063576.2063598 dblp:conf/cikm/ChenSNC11 fatcat:6yvgsjbxcbf57jbkml4eq43ary