Filters








545 Hits in 3.3 sec

Panning requirement nuggets in stream of software maintenance tickets

Senthil Mani, Karthik Sankaranarayanan, Vibha Singhal Sinha, Premkumar Devanbu
2014 Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014  
In this paper, we propose an approach to automatically analyze problem tickets to discover groups of problems being reported in them and provide meaningful, descriptive labels to help interpret these groups  ...  We provide detailed experiments to quantitatively and qualitatively evaluate our approach.  ...  Every ticket had a Title field containing a short description of the problem. All subjects had tickets created both automatically (by systems) and by humans.  ... 
doi:10.1145/2635868.2635897 dblp:conf/sigsoft/ManiSSD14 fatcat:7wnl5tplfnfl3lgqzx4u73snai

Nuggeteer

Gregory Marton, Alexey Radul
2006 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics -   unpublished
Human evaluators provide informal descriptions of each nugget, and judgements (assignments of nuggets to responses) for each response submitted by participants.  ...  Nuggeteer, by contrast, uses both the human descriptions and the human judgements, and makes binary decisions about each response, so that the end result is as interpretable as the official score.  ...  Approach Nuggeteer builds one binary classifier per nugget for each question, based on n-grams (up to trigrams) in the description and optionally in any provided judgement files.  ... 
doi:10.3115/1220835.1220883 fatcat:cirswvfslreanarbocpom63uiy

Northeastern University Runs at the TREC12 Crowdsourcing Track

Maryam Bashir, Jesse Anderton, Jie Wu, Matthew Ekstrand-Abueg, Peter B. Golbus, Virgil Pavlu, Javed A. Aslam
2012 Text Retrieval Conference  
These preferences are then extended to relevance judgments through the use of expectation maximization and the Elo rating system. Our third approach is based on our Nugget-based evaluation paradigm.  ...  The goal of the TREC 2012 Crowdsourcing Track was to evaluate approaches to crowdsourcing high quality relevance judgments for images and text documents.  ...  The nuggets are used to find relevant documents, using a technique based on our nugget-based evaluation framework [9] .  ... 
dblp:conf/trec/BashirAWEGPA12 fatcat:v2ja3fo7wfgwxamffgyjc7nrsm

Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources

Darina Benikova, Margot Mieskes, Christian M. Meyer, Iryna Gurevych
2016 International Conference on Computational Linguistics  
Coherent extracts are a novel type of summary combining the advantages of manually created abstractive summaries, which are fluent but difficult to evaluate, and low-quality automatically created extractive  ...  We find that our manually created corpus is of high quality and that it has the potential to bridge the gap between reference corpora of abstracts and automatic methods producing extracts.  ...  such as coherence and structure, which are important factors in human judgements, are not covered by a state-of-the-art ROUGE-based evaluation.  ... 
dblp:conf/coling/BenikovaMMG16 fatcat:er664p2ovzfztiii7odl6mllgq

Overview of WebCLEF 2007 [chapter]

Valentin Jijkoun, Maarten de Rijke
2008 Lecture Notes in Computer Science  
The WebCLEF 2007 task combines insights gained from previous editions of WebCLEF 2005-2006 [6, 1] and the WiQA 2006 pilot [4, 3], and goes beyond the navigational queries considered at WebCLEF 2005 and  ...  The task definition-which goes beyond traditional navigational queries and is concerned with undirected information search goals-combines insights gained at previous editions of WebCLEF and of the WiQA  ...  As a consequence, we did not use nugget-based measures for evaluation. Runs In total, 12 runs were submitted from 4 research groups.  ... 
doi:10.1007/978-3-540-85760-0_92 fatcat:b6oargknsvbtpjyxrrqeieuqiq

A Hybrid Approach for QA Track Definitional Questions

Sasha Blair-Goldensohn, Kathleen R. McKeown, Andrew Hazen Schlaikjer
2003 Text Retrieval Conference  
We present an overview of DefScriber, a system developed at Columbia University that combines knowledge-based and statistical methods to answer definitional questions of the form, "What is X?"  ...  We are also thankful for the generous and thoughtful contributions of colleagues at Columbia University and at the University of Colorado-Boulder.  ...  These judgements may be due in part to the need for higher level inference over response sentences and nuggets to see their connections.  ... 
dblp:conf/trec/Blair-GoldensohnMS03 fatcat:tp5k2gqhg5bddkymorl6dmsko4

The effect of expanding relevance judgements with duplicates

Gaurav Baruah, Adam Roegiest, Mark D. Smucker
2014 Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval - SIGIR '14  
We recommend adding exact duplicate sentences to the set of relevance judgements in order to obtain a more accurate estimate of system performance.  ...  Including new sentences that are exact duplicates of the previously judged sentences may allow for better estimation of performance metrics and enhance the reusability of a test collection.  ...  Nugget-based evaluation [4, 6] -where identified relevant material is representative of relevance -is aimed towards automatic identification of nuggets in the whole collection.  ... 
doi:10.1145/2600428.2609534 dblp:conf/sigir/BaruahRS14 fatcat:ok4pdqz63rfazaoondunprjpni

RPI BLENDER TAC-KBP2015 System Description

Yu Hong, Di Lu, Dian Yu, Xiaoman Pan, Xiaobin Wang, Yadong Chen, Lifu Huang, Heng Ji
2015 Text Analysis Conference  
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government.  ...  W911NF-09-2-0053, NSF CAREER Award IIS-1523198, AFRL DREAM project, gift awards from IBM, Google and Bosch.  ...  In this event nugget coreference resolutio we extend it to retrieve related images automatically, though we didn't use it for the evaluation because we are not allowed to use web access.  ... 
dblp:conf/tac/HongLYPHJHWC15 fatcat:x6xbtxnlunf77fikippeiwsbh4

Answering Multiple Questions on a Topic From Heterogeneous Resources

Boris Katz, Matthew W. Bilotti, Sue Felshin, Aaron Fernandes, Wesley Hildebrandt, Roni Katzir, Jimmy J. Lin, Daniel Loreto, Gregory Marton, Federico Mora, Özlem Uzuner
2004 Text Retrieval Conference  
Another problem comparing precision was that we used data from the TREC12 judgements, and possibly 11 some of the data from the newer pattern judgements, in the evaluation.  ...  In each case, they were asked to mark target-nugget pairs, where the nugget was a good description of the target.  ... 
dblp:conf/trec/KatzBFFHKLLMMU04 fatcat:6o74z4ry6vd3bekp76bo2d372q

Human question answering performance using an interactive document retrieval system

Mark D. Smucker, James Allan, Blagovest Dachev
2012 Proceedings of the 4th Information Interaction in Context Symposium on - IIIX '12  
To achieve superior performance, future QA systems should combine the flexibility and precision of IR systems with the ease-of-use and recall advantages of QA systems.  ...  Every day, people widely use information retrieval (IR) systems to find documents that answer their questions.  ...  Evaluation The ciQA track uses a nugget-based evaluation. Each run may return as many answers to each question as desired up to a 7000 non-whitespace character limit.  ... 
doi:10.1145/2362724.2362735 dblp:conf/iiix/SmuckerAD12 fatcat:brijt767lfgwnhmgl2otpt4wji

Real-Time Web Scale Event Summarization Using Sequential Decision Making [article]

Chris Kedzie, Fernando Diaz, Kathleen McKeown
2016 arXiv   pre-print
We present a system based on sequential decision making for the online summarization of massive document streams, such as those found on the web. Given an event of interest (e.g.  ...  We demonstrate a 28.3% improvement in summary F1 and a 43.8% improvement in time-sensitive F1 metrics.  ...  Any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the NSF.  ... 
arXiv:1605.03664v1 fatcat:2ep3nkomk5bx5esns2lgh2ncry

Overview of WebCLEF 2008 [chapter]

Valentin Jijkoun, Maarten de Rijke
2009 Lecture Notes in Computer Science  
We detail the task, the assessment procedure, the evaluation measures and results.  ...  We also give an analysis of the evaluation measures and differences between the participating systems.  ...  This is unfortunate, because, as [6] argues, the strict precision/recall-based evaluation of the task does not allow us to reuse the human judgements for evaluating runs that humans have not assessed  ... 
doi:10.1007/978-3-642-04447-2_102 fatcat:l2azihwoazg7hoqs4hwgnzyuoq

Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task

An Yang, Kai Liu, Jing Liu, Yajuan Lyu, Sujian Li
2018 Proceedings of the Workshop on Machine Reading for Question Answering  
Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between candidate and reference answers, such as ROUGE and BLEU  ...  However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists.  ...  This work was partially supported by National Natural Science Foundation of China (61572049) and Baidu-Peking University Joint Project.  ... 
doi:10.18653/v1/w18-2611 dblp:conf/acl/YangLLLL18 fatcat:pzqk3urdq5ea3iuxba7v4nxenu

Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task [article]

An Yang, Kai Liu, Jing Liu, Yajuan Lyu, Sujian Li
2018 arXiv   pre-print
Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between the candidate and reference answers, such as ROUGE and  ...  However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists.  ...  This work was partially supported by National Natural Science Foundation of China (61572049) and Baidu-Peking University Joint Project.  ... 
arXiv:1806.03578v1 fatcat:g66b2rwv5jbnjie5vtwhzuuw5e

IN SEARCH OF NOVELTY

1973 The Lancet  
Tuning should therefore use an unbiased sample of actual search requests, a judging process accurately modelling that of real searchers, and measures optimally correlated with real satisfaction.  ...  DTDs for test and result files, plus an associated toolkit and C-TEST example testfiles for some TREC tasks, are available at es.csiro.au/C-TEST.  ...  Acknowledgements We gratefully acknowledge useful information provided by Jacques Savoy and Alexander Krumpholz.  ... 
doi:10.1016/s0140-6736(73)90676-4 fatcat:i2ifh6u7drajpfbnalymmukua4
« Previous Showing results 1 — 15 out of 545 results