Document selection methodologies for efficient and effective learning-to-rank

Javed A. Aslam, Evangelos Kanoulas, Virgil Pavlu, Stefan Savev, Emine Yilmaz
2009 Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval - SIGIR '09  
Learning-to-rank has attracted great attention in the IR community. Much thought and research has been placed on query-document feature extraction and development of sophisticated learning-to-rank algorithms. However, relatively little research has been conducted on selecting documents for learning-to-rank data sets nor on the effect of these choices on the efficiency and effectiveness of learning-to-rank algorithms. In this paper, we employ a number of document selection methodologies, widely
more » ... sed in the context of evaluationdepth-k pooling, sampling (infAP, statAP), active-learning (MTC), and on-line heuristics (hedge). Certain methodologies, e.g. sampling and active-learning, have been shown to lead to efficient and effective evaluation. We investigate whether they can also enable efficient and effective learningto-rank. We compare them with the document selection methodology used to create the LETOR datasets. Further, all of the utilized methodologies are different in nature, and thus they construct training data sets with different properties, such as the proportion of relevant documents in the data or the similarity among them. We study how such properties affect the efficiency, effectiveness, and robustness of learning-to-rank collections.
doi:10.1145/1571941.1572022 dblp:conf/sigir/AslamKPSY09 fatcat:7eunxfursfdjfh4drowqjygcbu