23,553 Hits in 4.4 sec

Estimating Pool-depth on Per Query Basis

Sukomal Pal, Mandar Mitra, Samaresh Maiti
2010 NTCIR Conference on Evaluation of Information Access Technologies  
Instead of using an apriori-fixed depth, variable pool-depth based pooling is adopted. The pool for each topic is incrementally built and judged interactively.  ...  When no new relevant document is found for a reasonably long run of pool-depths, pooling can be stopped for the topic.  ...  Within the traditional framework of the Cranfield paradigm, it offers an interactive pooling approach based on variable pool-depth per query.  ... 
dblp:conf/ntcir/PalMM10 fatcat:5xcjycnzmzgdnpxrgwy2zqmfle


Peter Bailey, Alistair Moffat, Falk Scholer, Paul Thomas
2016 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR '16  
Qualified crowd workers made relevance judgments relative to the backstories, using a relevance scale similar to the original TREC approach; first to a pool depth of ten per query, then deeper on a set  ...  The backstories, query variations, normalized and spell-corrected queries, effort estimates, run outputs, and relevance judgments are made available collectively as the UQV100 test collection.  ...  on a per-document retrieved basis.  ... 
doi:10.1145/2911451.2914671 dblp:conf/sigir/BaileyMST16 fatcat:badjuucmcrd6hbbb47xszn4rju

How reliable are the results of large-scale information retrieval experiments?

Justin Zobel
1998 Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '98  
We propose a new pooling strategy that can significantly increase the number of relevant documents found for given effort, without compromising fairness.  ...  Simple regression on the per-query number of new relevant documents found at each pool depth, although highly approximate, is a good basis for choice of queries for further judgement effort.  ...  Figure 2 : 2 Total number of new relevant documents at each pool depth, actual and estimated, for queries 251-300 from TREC 5. On left, depths 3-100.  ... 
doi:10.1145/290941.291014 dblp:conf/sigir/Zobel98 fatcat:hgtme6o5y5dqfp7i2w3bgawxtu

Evaluation effort, reliability and reusability in XML retrieval

Sukomal Pal, Mandar Mitra, Jaap Kamps
2010 Journal of the American Society for Information Science and Technology  
Finally, they observe that for a fixed amount of effort, judging shallow pools for many queries is better than judging deep pools for a smaller set of queries.  ...  What is the minimum pool/query-set size that can be used to reliably evaluate systems?  ...  The most-effective use of available manpower may be made by choosing the pool-depth/pool size on a per query basis.  ... 
doi:10.1002/asi.21403 fatcat:zusa7ro7brgwfpnnkxpvu3degq

Strategic system comparisons via targeted relevance judgments

Alistair Moffat, William Webber, Justin Zobel
2007 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07  
Given a collection of documents and queries, and a set of systems being compared, a standard approach to forming judgments is to manually examine all documents that are highly ranked by any of the systems  ...  Using rank-biased precision, a recently proposed effectiveness measure, we show that judging around 200 documents for each of 50 queries in a TREC-scale system evaluation containing over 100 runs is sufficient  ...  Up to two runs per research group were used as the basis for the pool, making a total of 71 pooled runs.  ... 
doi:10.1145/1277741.1277806 dblp:conf/sigir/MoffatWZ07 fatcat:lytymtvrn5cwbfp7t4hmcey5se

Rank-biased precision for measurement of retrieval effectiveness

Alistair Moffat, Justin Zobel
2008 ACM Transactions on Information Systems  
These are typically intended to provide a quantitative single-value summary of a document ranking relative to a query. However, many of these measures have failings.  ...  Rank-biased precision is derived from a simple model of user behavior, is robust if answer rankings are extended to greater depths, and allows accurate quantification of experimental uncertainty, even  ...  The simplest case is when the ranking is calculated to a depth of d answers per query, and the contributions from depth d + 1 on are not available.  ... 
doi:10.1145/1416950.1416952 fatcat:qpe7245dgfelvn5hwnjrjyuiuq

A comparison of pooled and sampled relevance judgments

Ian Soboroff
2007 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07  
Pooling is the most common technique used to build modern test collections. Evidence is mounting that pooling may not yield reusable test collections for very large document sets.  ...  The sample judgments rank systems somewhat differently than the pool. Some analysis and plans for further research are discussed.  ...  This sample reached to a depth that varied per topic depending on the number of relevant documents in the depth-50 pool.  ... 
doi:10.1145/1277741.1277908 dblp:conf/sigir/Soboroff07 fatcat:nxpsjcdiwjhb5la4b6twdfp5tq

Overview of the TREC 2007 Legal Track

Stephen Tomlinson, Douglas W. Oard, Jason R. Baron, Paul Thompson
2007 Text Retrieval Conference  
TREC 2007 was the second year of the Legal Track, which focuses on evaluation of search technology for discovery of electronically stored information in litigation and regulatory settings.  ...  Feedback (two-pass search in a controlled setting with some relevant and nonrelevant documents manually marked after the first pass) and Interactive (in which real users could iteratively refine their queries  ...  On average (per topic), the estimated number of non-relevant documents in the pool was 298,678, and the estimated number of gray documents in the pool was 4,303.  ... 
dblp:conf/trec/TomlinsonOBT07 fatcat:l3omolwxrfhv5omuqbtwftfet4

RMIT at TREC 2010 Blog Track: Faceted Blog Distillation Task

Zhixin Zhou, Xiuzhen Zhang, Phil Vines
2010 Text Retrieval Conference  
A SVM classifier has been trained on Blog 06 collection to produce the opinion scores for each post. The cross entropy is used to evaluate posts for the in-depth versus shallow facet.  ...  For the baseline task, we adopted the BM25 model implemented in the Zettair search engine to establish a retrieval system of blog posts based on topic relevance.  ...  We performed the transformation on a per-topic basis, as we expected different distributions of post similarity scores for each topic.  ... 
dblp:conf/trec/ZhouZV10 fatcat:pftbz2us7farbj3yqnm7ldslmy

RMIT at the 2018 TREC CORE Track

Rodger Benham, Luke Gallagher, Joel Mackenzie, Binsheng Liu, Xiaolu Lu, Falk Scholer, J. Shane Culpepper, Alistair Moffat
2018 Text Retrieval Conference  
Our thesis is that over-reliance on a single query can lead to suboptimal performance, and that by creating multiple query representations for an information need and combining the relevance signals through  ...  It forms the basis of web search, question-answering, and a new generation of virtual assistants being developed by several of the largest so ware companies in the world.  ...  On average each topic had a pool-depth of 10.40, compared to the NIST assessment average pool depth per-topic of 524.66.  ... 
dblp:conf/trec/BenhamGML0SCM18 fatcat:wo453fdgnnbnxnwwv3hba6f7pm

Enhancing Flood Impact Analysis using Interactive Retrieval of Social Media Images [article]

Björn Barz, Kai Schröter, Moritz Münch, Bin Yang, Andrea Unger, Doris Dransch, Joachim Denzler
2019 arXiv   pre-print
To evaluate this approach, we introduce a new dataset of 3,710 flood images, annotated by domain experts regarding their relevance with respect to three tasks (determining the flooded area, inundation depth  ...  This limitation could be alleviated by leveraging information contained in images of the event posted on social media platforms, so-called "Volunteered Geographic Information (VGI)".  ...  We choose the two thresholds needed for ITML on a per- query basis as follows: All pairs of relevant images should be closer to each other than half the distance between the query and the first irrelevant  ... 
arXiv:1908.03361v1 fatcat:kgjsvmp5b5f2pfdyvnh5mxkikq

A Test Collection for Matching Patients to Clinical Trials

Bevan Koopman, Guido Zuccon
2016 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval - SIGIR '16  
ad-hoc); iv) a user provided estimate of how many trials they expect each patient topic would be eligible for; and v) relevance assessments by medical professionals.  ...  We present a test collection to study the use of search engines for matching eligible patients (the query) to clinical trials (the document).  ...  Note that ad-hoc queries generated more than one run per topic per system, i.e., on average each system generated 8.2 runs per topic.  ... 
doi:10.1145/2911451.2914672 dblp:conf/sigir/KoopmanZ16 fatcat:ndy2wurrd5ggbdvfgqq5nvjxxm

TREC 2017 Common Core Track Overview

James Allan, Donna Harman, Evangelos Kanoulas, Dan Li, Christophe Van Gysel, Ellen M. Voorhees
2017 Text Retrieval Conference  
in alphabetical order. 1 Participants have the opportunity to train their systems over the TREC 2004 Robust track (which, to some extend, alters the participants task to a routing task, i.e. fixed queries  ...  depth-100 pools.  ...  Individual topic budgets were set based on estimates of the number of nonrelevant there would be in the combined top-10 pools using the first stage judgments to make the estimates, and then allocated as  ... 
dblp:conf/trec/AllanHKLGV17 fatcat:sj7iwu27wbfwnm2wbenlgbxmfm

Modeling Relevance as a Function of Retrieval Rank [chapter]

Xiaolu Lu, Alistair Moffat, J. Shane Culpepper
2016 Lecture Notes in Computer Science  
are presumed, and extrapolated metric scores are computed based on models developed from those shallow pools.  ...  Here we consider the same problem from another perspective, and investigate the relationship between relevance likelihood and retrieval rank, seeking to identify plausible methods for estimating document  ...  Model G H () is a hybrid that selects the best of the other models on a per-topic basis.  ... 
doi:10.1007/978-3-319-48051-0_1 fatcat:fwgj35uvzvekzdi3r54iedf2ru

Deep regional feature pooling for video matching

Yan Bai, Jie Lin, Vijay Chandrasekhar, Yihang Lou, Shiqi Wang, Ling-Yu Duan, Tiejun Huang, Alex Kot
2017 2017 IEEE International Conference on Image Processing (ICIP)  
We aim to analyze the joint effect of ROI (Region of Interest) size and pooling moment on video matching performance.  ...  Empirical studies on the challenging MPEG CDVA dataset demonstrate that performance trends are consistent between the estimation and experimental results, though the theoretical model is largely simplified  ...  In this work, we estimate video matching performance by simply analyzing the matching function k(., .) on per-channel basis (i.e., equal contribution for all channels). Matching function.  ... 
doi:10.1109/icip.2017.8296307 dblp:conf/icip/BaiLCLWDHK17 fatcat:arvmeiws3rf4hjjzbl7hz3fq5e
« Previous Showing results 1 — 15 out of 23,553 results