1,067 Hits in 4.0 sec

IR system evaluation using nugget-based test collections

Virgil Pavlu, Shahzad Rajput, Peter B. Golbus, Javed A. Aslam
2012 Proceedings of the fifth ACM international conference on Web search and data mining - WSDM '12  
We then show how these inferred relevance assessments can be used to perform IR system evaluation, and we discuss in particular reusability and scalability.  ...  The development of information retrieval systems such as search engines relies on good test collections, including assessments of retrieved content.  ...  test collection methodology to IR system evaluation.  ... 
doi:10.1145/2124295.2124343 dblp:conf/wsdm/PavluRGA12 fatcat:3tzpysivnjdk3aerpj6plnwbhe

Novelty and diversity in information retrieval evaluation

Charles L.A. Clarke, Maheedhar Kolla, Gordon V. Cormack, Olga Vechtomova, Azin Ashkan, Stefan Büttcher, Ian MacKinnon
2008 Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '08  
We demonstrate the feasibility of our approach using a test collection based on the TREC question answering track.  ...  Such objective functions must accurately reflect user requirements, particularly when tuning IR systems and learning ranking functions.  ...  In addition, we describe a test collection exploring our proposal based on the TREC 2005/2006 question answering collections.  ... 
doi:10.1145/1390334.1390446 dblp:conf/sigir/ClarkeKCVABM08 fatcat:6ozn44dfi5ejzfrglkgxah6sqa

Automatic Ground Truth Expansion for Timeline Evaluation

Richard McCreadie, Craig Macdonald, Iadh Ounis
2018 The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval - SIGIR '18  
To this end, several evaluation metrics and labeling methodologies have recently been developed -focusing on information nugget or cluster-based ground truth representations, respectively.  ...  We show that when considering lower-effectiveness systems, the test collections are robust (the likelihood of systems being miss-ranked is low).  ...  Having to perform reassessment for each new system or summary to be evaluated reduces the value that these test collections bring to IR evaluation.  ... 
doi:10.1145/3209978.3210034 dblp:conf/sigir/McCreadieMO18 fatcat:e72afifhmzcffphepperyqhuny

Efficient Test Collection Construction via Active Learning [article]

Md Mustafizur Rahman, Mucahid Kutlu, Tamer Elsayed, Matthew Lease
2018 arXiv   pre-print
To create a new IR test collection at minimal cost, we must carefully select which documents merit human relevance judgments.  ...  We report experiments on four TREC collections with varying scarcity of relevant documents, reporting labeling accuracy achieved, as well as rank correlation when evaluating participant systems using these  ...  INTRODUCTION Test collections provide the foundation for Cranfield-based evaluation of information retrieval (IR) systems [14, 35] .  ... 
arXiv:1801.05605v2 fatcat:ssaz5gvat5h43njyf5difo7vju

A nugget-based test collection construction paradigm

Shahzad Rajput, Virgil Pavlu, Peter B. Golbus, Javed A. Aslam
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
The problem of building test collections is central to the development of information retrieval systems such as search engines.  ...  Starting with a few relevant "nuggets" of information manually extracted from existing TREC corpora, we implement and test a methodology that finds and correctly assesses the vast majority of relevant  ...  Acknowledgment: This material is based upon work supported by the National Science Foundation under Grant No. IIS-1017903.  ... 
doi:10.1145/2063576.2063861 dblp:conf/cikm/RajputPGA11 fatcat:cj7jsyqi7jfyfom6wsxnvggqb4

Overview of WebCLEF 2007 [chapter]

Valentin Jijkoun, Maarten de Rijke
2008 Lecture Notes in Computer Science  
As a consequence, we did not use nugget-based measures for evaluation. Runs In total, 12 runs were submitted from 4 research groups.  ...  Then, in addition to character-based measures above, a nugget-based recall can be defined based on the number of nuggets (rather than lengths of character spans) found by a system.  ... 
doi:10.1007/978-3-540-85760-0_92 fatcat:b6oargknsvbtpjyxrrqeieuqiq

Overview of the TREC 2006 Question Answering Track 99

Hoa Trang Dang, Jimmy J. Lin, Diane Kelly
2006 Text Retrieval Conference  
Multiple assessors were used to judge the importance of information nuggets used to evaluate the responses to ciQA and "Other" questions, resulting in an evaluation that is more stable and discriminative  ...  than one that uses only a single assessor to judge nugget importance.  ...  Answers to ciQA topics consisted of [doc-id, answer-string] pairs, and were evaluated using the same nugget-based methodology that was employed for the main task Other questions.  ... 
dblp:conf/trec/DangLK06 fatcat:mdaa75b4v5czbb227weaxpuuwy

Towards a unified framework for opinion retrieval, mining and summarization

Elena Lloret, Alexandra Balahur, José M. Gómez, Andrés Montoyo, Manuel Palomar
2012 Journal of Intelligent Information Systems  
The evaluation carried out and the results obtained shows the appropriateness of the individual components, as well as the framework as a whole.  ...  The exponential increase of subjective, user-generated content since the birth of the Social Web, has led to the necessity of developing automatic text processing systems able to extract, process and present  ...  Nugget-based Evaluation at TAC Within the Opinion Summarization Pilot task, each summary was evaluated according to its content using the Pyramid method (Nenkova et al, 2007) .  ... 
doi:10.1007/s10844-012-0209-4 fatcat:4impsog5hzdqjcixfqxgln2z4m

Amber Gemstones Sorting By Colour

Saulius Sinkevicius, Arunas Lipnickas, Kestas Rimkus
2017 Elektronika ir Elektrotechnika  
The developed system has been used in an automated amber sorting line to increase the quantities of sorted amber nuggets.  ...  This system can be used, for example in combination with conveyor systems, and in any other case that requires distinguishing objects of many classes in a high-rate flow of objects.  ...  is the number of classifiers in the collective. The classification algorithm goes as follows: 1. The model is tested using test set.  ... 
doi:10.5755/j01.eie.23.2.17993 fatcat:lu2yj46ferga5ejavyrpu4p4wq

SoochowNLP Team System Description for 2016 KBP Slot Filling and Nugget Detection Tasks

Yu Hong, Yingying Qiu, Zengzhuang Xu, Wenxuan Zhou, Jian Tang, Xiaobin Wang, Liang Yao, Jianmin Yao
2016 Text Analysis Conference  
We submitted a Code Start-Slot Filling (SF) system and a Nugget Detection (ND) system in this year's KBP evaluation conference.  ...  There is a brand new provenance retrieval and a filler filtering method used to implement the SF system. For the ND system, we employed a new propagation method.  ...  We collect the seeds from ground-truth triggers of slots in 2014 slot filling training data and test data.  ... 
dblp:conf/tac/HongQXZTWYY16 fatcat:h5qgokaiajfzrm7i3rrwqhv24a

Utility-based information distillation over temporally sequenced documents

Yiming Yang, Abhimanyu Lad, Ni Lao, Abhay Harpale, Bryan Kisiel, Monica Rogati
2007 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR '07  
Answer keys (nuggets) were generated for each query and a semiautomatic procedure was used for acquiring rules that allow automatically matching nuggets against system responses.  ...  Our results show encouraging utility enhancements using the new approach, compared to the baseline systems without incremental learning or the novelty detection components.  ...  Stuart Shulman for their help with collecting and processing the extended TDT4 annotations used in our experiments.  ... 
doi:10.1145/1277741.1277750 dblp:conf/sigir/YangLLHKR07 fatcat:7zbihodsf5aftnkvpf3zdcgyz4

Overview of the TREC 2007 Question Answering Track

Hoa Trang Dang, Diane Kelly, Jimmy J. Lin
2007 Text Retrieval Conference  
ciQA design for evaluation of interactive systems.  ...  The evaluation of factoid and list responses distinguished between answers that were globally correct (with respect to the document collection), and those that were only locally correct (with respect to  ...  Evaluation Methodology System responses were evaluated using the "nugget pyramid" extension of the nugget-based methodology used in previous TREC QA tasks (Lin and Demner-Fushman, 2006) .  ... 
dblp:conf/trec/DangKL07 fatcat:rofh7t2mvbe73nyq25kmeaajre

Evaluating diversified search results using per-intent graded relevance

Tetsuya Sakai, Ruihua Song
2011 Proceedings of the 34th international ACM SIGIR conference on Research and development in Information - SIGIR '11  
We compare a wide range of traditional and diversified IR metrics after adding graded relevance assessments to the TREC 2009 Web track diversity task test collection which originally had binary relevance  ...  Our results show that diversified IR experiments with a given number of topics can be as reliable as traditional IR experiments with the same number of topics, provided that the right metrics are used.  ...  Reducing the Diversity Test Collection In addition to evaluating diversified IR metrics using TR09DIV+gr, we evaluated traditional graded-relevance IR metrics on which the diversified IR metrics are based  ... 
doi:10.1145/2009916.2010055 dblp:conf/sigir/SakaiS11 fatcat:rh6vfuva3fcmjbpgrujhuvmnf4

Towards a Multi-Stream Question Answering-As-XML-Retrieval Strategy

David Ahn, Sisay Fissaha Adafre, Valentin Jijkoun, Karin Müller, Maarten de Rijke, Erik F. Tjong Kim Sang
2005 Text Retrieval Conference  
In order to test this, we created a run (uams05rnk) in which the answers of the complete system had been reranked based on their frequency.  ...  The method uses IR and NLP techniques to locate documents containing information about the topic, and extract nuggets from the retrieved documents.  ... 
dblp:conf/trec/AhnAJMRS05 fatcat:vg2njg3w4rhjlgxsfgspyzog7e

From Babel to Knowledge

Daniel J. Cohen
2006 D-Lib Magazine  
The TREC 2004 Question Answering track contained a single task in which question series were used to define a set of targets.  ...  Applying the combined measure on a per-series basis produces a QA task evaluation that more closely mimics classic document retrieval evaluation.  ...  A system's response to a list question was scored using instance precision (IP) and instance recall (IR) based on the list of known instances.  ... 
doi:10.1045/march2006-cohen fatcat:4ddkch4mljfb5cxwwkno5fbzdm
« Previous Showing results 1 — 15 out of 1,067 results