Document re-ranking using cluster validation and label propagation

Lingpeng Yang, Donghong Ji, Guodong Zhou, Yu Nie, Guozheng Xiao
2006 Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06  
This paper proposes a novel document re-ranking approach in information retrieval, which is done by a label propagationbased semi-supervised learning algorithm to utilize the intrinsic structure underlying in the large document data. Since no labeled relevant or irrelevant documents are generally available in IR, our approach tries to extract some pseudo labeled documents from the ranking list of the initial retrieval. For pseudo relevant documents, we determine a cluster of documents from the
more » ... documents from the top ones via cluster validation-based kmeans clustering; for pseudo irrelevant ones, we pick a set of documents from the bottom ones. Then the ranking of the documents can be conducted via label propagation. Evaluation on benchmark corpora shows that the approach can achieve significant improvement over standard baselines and performs better than other related approaches.
doi:10.1145/1183614.1183713 dblp:conf/cikm/YangJZNX06 fatcat:checcs6atrgulfrh23lxt4hz6m