A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Panning requirement nuggets in stream of software maintenance tickets
2014
Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2014
In this paper, we propose an approach to automatically analyze problem tickets to discover groups of problems being reported in them and provide meaningful, descriptive labels to help interpret these groups ...
We provide detailed experiments to quantitatively and qualitatively evaluate our approach. ...
Every ticket had a Title field containing a short description of the problem. All subjects had tickets created both automatically (by systems) and by humans. ...
doi:10.1145/2635868.2635897
dblp:conf/sigsoft/ManiSSD14
fatcat:7wnl5tplfnfl3lgqzx4u73snai
Nuggeteer
2006
Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics -
unpublished
Human evaluators provide informal descriptions of each nugget, and judgements (assignments of nuggets to responses) for each response submitted by participants. ...
Nuggeteer, by contrast, uses both the human descriptions and the human judgements, and makes binary decisions about each response, so that the end result is as interpretable as the official score. ...
Approach Nuggeteer builds one binary classifier per nugget for each question, based on n-grams (up to trigrams) in the description and optionally in any provided judgement files. ...
doi:10.3115/1220835.1220883
fatcat:cirswvfslreanarbocpom63uiy
Northeastern University Runs at the TREC12 Crowdsourcing Track
2012
Text Retrieval Conference
These preferences are then extended to relevance judgments through the use of expectation maximization and the Elo rating system. Our third approach is based on our Nugget-based evaluation paradigm. ...
The goal of the TREC 2012 Crowdsourcing Track was to evaluate approaches to crowdsourcing high quality relevance judgments for images and text documents. ...
The nuggets are used to find relevant documents, using a technique based on our nugget-based evaluation framework [9] . ...
dblp:conf/trec/BashirAWEGPA12
fatcat:v2ja3fo7wfgwxamffgyjc7nrsm
Bridging the gap between extractive and abstractive summaries: Creation and evaluation of coherent extracts from heterogeneous sources
2016
International Conference on Computational Linguistics
Coherent extracts are a novel type of summary combining the advantages of manually created abstractive summaries, which are fluent but difficult to evaluate, and low-quality automatically created extractive ...
We find that our manually created corpus is of high quality and that it has the potential to bridge the gap between reference corpora of abstracts and automatic methods producing extracts. ...
such as coherence and structure, which are important factors in human judgements, are not covered by a state-of-the-art ROUGE-based evaluation. ...
dblp:conf/coling/BenikovaMMG16
fatcat:er664p2ovzfztiii7odl6mllgq
Overview of WebCLEF 2007
[chapter]
2008
Lecture Notes in Computer Science
The WebCLEF 2007 task combines insights gained from previous editions of WebCLEF 2005-2006 [6, 1] and the WiQA 2006 pilot [4, 3], and goes beyond the navigational queries considered at WebCLEF 2005 and ...
The task definition-which goes beyond traditional navigational queries and is concerned with undirected information search goals-combines insights gained at previous editions of WebCLEF and of the WiQA ...
As a consequence, we did not use nugget-based measures for evaluation.
Runs In total, 12 runs were submitted from 4 research groups. ...
doi:10.1007/978-3-540-85760-0_92
fatcat:b6oargknsvbtpjyxrrqeieuqiq
A Hybrid Approach for QA Track Definitional Questions
2003
Text Retrieval Conference
We present an overview of DefScriber, a system developed at Columbia University that combines knowledge-based and statistical methods to answer definitional questions of the form, "What is X?" ...
We are also thankful for the generous and thoughtful contributions of colleagues at Columbia University and at the University of Colorado-Boulder. ...
These judgements may be due in part to the need for higher level inference over response sentences and nuggets to see their connections. ...
dblp:conf/trec/Blair-GoldensohnMS03
fatcat:tp5k2gqhg5bddkymorl6dmsko4
The effect of expanding relevance judgements with duplicates
2014
Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval - SIGIR '14
We recommend adding exact duplicate sentences to the set of relevance judgements in order to obtain a more accurate estimate of system performance. ...
Including new sentences that are exact duplicates of the previously judged sentences may allow for better estimation of performance metrics and enhance the reusability of a test collection. ...
Nugget-based evaluation [4, 6] -where identified relevant material is representative of relevance -is aimed towards automatic identification of nuggets in the whole collection. ...
doi:10.1145/2600428.2609534
dblp:conf/sigir/BaruahRS14
fatcat:ok4pdqz63rfazaoondunprjpni
RPI BLENDER TAC-KBP2015 System Description
2015
Text Analysis Conference
The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. ...
W911NF-09-2-0053, NSF CAREER Award IIS-1523198, AFRL DREAM project, gift awards from IBM, Google and Bosch. ...
In this event nugget coreference resolutio we extend it to retrieve related images automatically, though we didn't use it for the evaluation because we are not allowed to use web access. ...
dblp:conf/tac/HongLYPHJHWC15
fatcat:x6xbtxnlunf77fikippeiwsbh4
Answering Multiple Questions on a Topic From Heterogeneous Resources
2004
Text Retrieval Conference
Another problem comparing precision was that we used data from the TREC12 judgements, and possibly 11 some of the data from the newer pattern judgements, in the evaluation. ...
In each case, they were asked to mark target-nugget pairs, where the nugget was a good description of the target. ...
dblp:conf/trec/KatzBFFHKLLMMU04
fatcat:6o74z4ry6vd3bekp76bo2d372q
Human question answering performance using an interactive document retrieval system
2012
Proceedings of the 4th Information Interaction in Context Symposium on - IIIX '12
To achieve superior performance, future QA systems should combine the flexibility and precision of IR systems with the ease-of-use and recall advantages of QA systems. ...
Every day, people widely use information retrieval (IR) systems to find documents that answer their questions. ...
Evaluation The ciQA track uses a nugget-based evaluation. Each run may return as many answers to each question as desired up to a 7000 non-whitespace character limit. ...
doi:10.1145/2362724.2362735
dblp:conf/iiix/SmuckerAD12
fatcat:brijt767lfgwnhmgl2otpt4wji
Real-Time Web Scale Event Summarization Using Sequential Decision Making
[article]
2016
arXiv
pre-print
We present a system based on sequential decision making for the online summarization of massive document streams, such as those found on the web. Given an event of interest (e.g. ...
We demonstrate a 28.3% improvement in summary F1 and a 43.8% improvement in time-sensitive F1 metrics. ...
Any opinions, findings and conclusions or recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the NSF. ...
arXiv:1605.03664v1
fatcat:2ep3nkomk5bx5esns2lgh2ncry
Overview of WebCLEF 2008
[chapter]
2009
Lecture Notes in Computer Science
We detail the task, the assessment procedure, the evaluation measures and results. ...
We also give an analysis of the evaluation measures and differences between the participating systems. ...
This is unfortunate, because, as [6] argues, the strict precision/recall-based evaluation of the task does not allow us to reuse the human judgements for evaluating runs that humans have not assessed ...
doi:10.1007/978-3-642-04447-2_102
fatcat:l2azihwoazg7hoqs4hwgnzyuoq
Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task
2018
Proceedings of the Workshop on Machine Reading for Question Answering
Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between candidate and reference answers, such as ROUGE and BLEU ...
However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. ...
This work was partially supported by National Natural Science Foundation of China (61572049) and Baidu-Peking University Joint Project. ...
doi:10.18653/v1/w18-2611
dblp:conf/acl/YangLLLL18
fatcat:pzqk3urdq5ea3iuxba7v4nxenu
Adaptations of ROUGE and BLEU to Better Evaluate Machine Reading Comprehension Task
[article]
2018
arXiv
pre-print
Current evaluation metrics to question answering based machine reading comprehension (MRC) systems generally focus on the lexical overlap between the candidate and reference answers, such as ROUGE and ...
However, bias may appear when these metrics are used for specific question types, especially questions inquiring yes-no opinions and entity lists. ...
This work was partially supported by National Natural Science Foundation of China (61572049) and Baidu-Peking University Joint Project. ...
arXiv:1806.03578v1
fatcat:g66b2rwv5jbnjie5vtwhzuuw5e
IN SEARCH OF NOVELTY
1973
The Lancet
Tuning should therefore use an unbiased sample of actual search requests, a judging process accurately modelling that of real searchers, and measures optimally correlated with real satisfaction. ...
DTDs for test and result files, plus an associated toolkit and C-TEST example testfiles for some TREC tasks, are available at es.csiro.au/C-TEST. ...
Acknowledgements We gratefully acknowledge useful information provided by Jacques Savoy and Alexander Krumpholz. ...
doi:10.1016/s0140-6736(73)90676-4
fatcat:i2ifh6u7drajpfbnalymmukua4
« Previous
Showing results 1 — 15 out of 545 results