On Penalising Late Arrival of Relevant Documents in Information Retrieval Evaluation with Graded Relevance

Tetsuya Sakai
2007 NTCIR Conference on Evaluation of Information Access Technologies  
Large-scale information retrieval evaluation efforts such as TREC and NTCIR have tended to adhere to binary-relevance evaluation metrics, even when graded relevance data were available. However, the NTCIR-6 Crosslingual Task has finally started adopting graded-relevance metrics, though only as additional metrics. This paper compares three existing graded-relevance metrics that were mentioned in the Call for Participation of the NTCIR-6 Crosslingual Task in terms of the ability to control how
more » ... erely "late arrival" of relevant documents should be penalised. We argue and demonstrate that Q-measure is more flexible than normalised Discounted Cumulative Gain and generalised Average Precision. We then suggest a brief guideline for conducting a reliable information retrieval evaluation with graded relevance.
dblp:conf/ntcir/Sakai07 fatcat:sjkrsviwirf5dfp3vr3b4eeuuy