Evaluating different methods of estimating retrieval quality for resource selection

Henrik Nottelmann, Norbert Fuhr
2003 Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval - SIGIR '03  
In a federated digital library system, it is too expensive to query every accessible library. Resource selection is the task to decide to which libraries a query should be routed. Most existing resource selection algorithms compute a library ranking in a heuristic way. In contrast, the decision-theoretic framework (DTF) follows a different approach on a better theoretic foundation: It computes a selection which minimises the overall costs (e.g. retrieval quality, time, money) of the distributed
more » ... retrieval. For estimating retrieval quality the recall-precision function is proposed. In this paper, we introduce two new methods: The first one computes the empirical distribution of the probabilities of relevance from a small library sample, and assumes it to be representative for the whole library. The second method assumes that the indexing weights follow a normal distribution, leading to a normal distribution for the document scores. Furthermore, we present the first evaluation of DTF by comparing this theoretical approach with the heuristical stateof-the-art system CORI; here we find that DTF outperforms CORI in most cases. Resource selection, decision-theoretic framework, formal models, normal distribution, evaluation 2. The second, new method estimates the distribution of the probabilities of relevance by simulating retrieval on a small sample.
doi:10.1145/860435.860489 dblp:conf/sigir/NottelmannF03 fatcat:scgvhl3k25cltnwm5tcep5hoam