Filters








30 Hits in 1.8 sec

Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning [article]

Pradeep Dasigi, Nelson F. Liu, Ana Marasović, Noah A. Smith, Matt Gardner
2019 arXiv   pre-print
We present a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference among entities in over 4.7K English paragraphs from Wikipedia.  ...  Machine comprehension of texts longer than a single sentence often requires coreference resolution.  ...  Thanks to HuggingFace for releasing pytorch-transformers, and to Dheeru Dua for sharing with us the crowdsourcing setup used for DROP.  ... 
arXiv:1908.05803v2 fatcat:7cb25dknwjb67pbwmhbarwryzq

Quoref: A Reading Comprehension Dataset with Questions Requiring Coreferential Reasoning

Pradeep Dasigi, Nelson F. Liu, Ana Marasovic, Noah A. Smith, Matt Gardner
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
We present a new crowdsourced dataset containing more than 24K span-selection questions that require resolving coreference among entities in over 4.7K English paragraphs from Wikipedia.  ...  Machine comprehension of texts longer than a single sentence often requires coreference resolution.  ...  Thanks to HuggingFace for releasing pytorch-transformers, and to Dheeru Dua for sharing with us the crowdsourcing setup used for DROP.  ... 
doi:10.18653/v1/d19-1606 dblp:conf/emnlp/DasigiLMSG19 fatcat:mc43ix73djfhnc4dg54hbdbix4

Coreferential Reasoning Learning for Language Representation [article]

Deming Ye, Yankai Lin, Jiaju Du, Zhenghao Liu, Peng Li, Maosong Sun, Zhiyuan Liu
2020 arXiv   pre-print
The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks that require coreferential reasoning  ...  To address this issue, we present CorefBERT, a novel language representation model that can capture the coreferential relations in context.  ...  We first evaluate models on Questions Requiring Coreferential Reasoning dataset (QUOREF) (Dasigi et al., 2019).  ... 
arXiv:2004.06870v2 fatcat:gdxj7yucdzb6ll4cbsgf3j72fy

Tracing Origins: Coreference-aware Machine Reading Comprehension [article]

Baorong Huang, Zhuosheng Zhang, Hai Zhao
2022 arXiv   pre-print
language model, in order to highlight the coreference mentions of the entities that must be identified for coreference-intensive question answering in QUOREF, a relatively new dataset that is specifically  ...  Machine reading comprehension is a heavily-studied research and test field for evaluating new pre-trained language models (PrLMs) and fine-tuning strategies, and recent studies have enriched the pre-trained  ...  QUOREF dataset (Dasigi et al., 2019) is specifically designed to validate the performance of the models in coreferential reasoning, in that "78% of the manually analyzed questions cannot be answered  ... 
arXiv:2110.07961v2 fatcat:chszl5plc5cufivo4vojdbp74i

Comparing Test Sets with Item Response Theory [article]

Clara Vania, Phu Mon Htut, William Huang, Dhara Mungra, Richard Yuanzhe Pang, Jason Phang, Haokun Liu, Kyunghyun Cho, Samuel R. Bowman
2021 arXiv   pre-print
We find that Quoref, HellaSwag, and MC-TACO are best suited for distinguishing among state-of-the-art models, while SNLI, MNLI, and CommitmentBank seem to be saturated for current strong models.  ...  What kind of datasets are still effective at discriminating among strong models, and what kind of datasets should we expect to be able to detect future improvements?  ...  Liu, Ana Marasović, Noah A. Smith, and Matt Gardner. 2019. Quoref: A reading comprehension dataset with questions re- quiring coreferential reasoning.  ... 
arXiv:2106.00840v1 fatcat:holqdsprhbb5fhdzvp3dzltd7m

MOCHA: A Dataset for Training and Evaluating Generative Reading Comprehension Metrics [article]

Anthony Chen, Gabriel Stanovsky, Sameer Singh, Matt Gardner
2020 arXiv   pre-print
Posing reading comprehension as a generation problem provides a great deal of flexibility, allowing for open-ended questions with few restrictions on possible answers.  ...  To address this, we introduce a benchmark for training and evaluating generative reading comprehension metrics: MOdeling Correctness with Human Annotations.  ...  N660011924033 with the United States Office Of Naval Research.  ... 
arXiv:2010.03636v2 fatcat:y72r2mil6vfuxm5arvnag75sgy

MixQG: Neural Question Generation with Mixed Answer Types [article]

Lidiya Murakhovs'ka, Chien-Sheng Wu, Tong Niu, Wenhao Liu, Caiming Xiong
2021 arXiv   pre-print
We combine 9 question answering datasets with diverse answer types, including yes/no, multiple-choice, extractive, and abstractive answers, to train a single generative model.  ...  In this paper, we propose a neural question generator, MixQG, to bridge this gap.  ...  Liu, Ana Marasović, Noah A. Smith, and Matt Gardner. 2019. Quoref: A reading comprehension dataset with questions requiring coreferential reasoning. In Proc. of EMNLP-IJCNLP.  ... 
arXiv:2110.08175v1 fatcat:hm2elxdphfh53oikrnmyerpc6u

Benchmarking Machine Reading Comprehension: A Psychological Perspective [article]

Saku Sugawara, Pontus Stenetorp, Akiko Aizawa
2021 arXiv   pre-print
., reading comprehension by a model cannot be explained in human terms.  ...  Machine reading comprehension (MRC) has received considerable attention as a benchmark for natural language understanding.  ...  Acknowledgments The authors would like to thank Xanh Ho for helping create the dataset list and the anonymous reviewers for their insightful comments.  ... 
arXiv:2004.01912v2 fatcat:lyypngwm4vbk7igfcjfmhkn5ja

Beat the AI: Investigating Adversarial Human Annotation for Reading Comprehension [article]

Max Bartolo, Alastair Roberts, Johannes Welbl, Sebastian Riedel, Pontus Stenetorp
2020 arXiv   pre-print
Innovations in annotation methodology have been a catalyst for Reading Comprehension (RC) datasets and models.  ...  When trained on data collected with a BiDAF model in the loop, RoBERTa achieves 39.9F1 on questions that it cannot answer when trained on SQuAD - only marginally lower than when trained on data collected  ...  and innovation programme under grant agreement No. 875160 and the UK Defence Science and Technology Laboratory (Dstl) and Engineering and Physical Research Council (EPSRC) under grant EP/R018693/1 as a  ... 
arXiv:2002.00293v2 fatcat:nuranmqotrdxfertjs2pvjy4l4

Unsupervised Pronoun Resolution via Masked Noun-Phrase Prediction [article]

Ming Shen, Pratyay Banerjee, Chitta Baral
2021 arXiv   pre-print
Our method outperforms RoBERTa-large baseline with large margins, meanwhile, achieving a higher AUC score after further finetuning on the remaining three official splits of WinoGrande.  ...  Firstly, We evaluate our pre-trained model on various pronoun resolution datasets without any finetuning. Our method outperforms all previous unsupervised methods on all datasets by large margins.  ...  Liu, Ana Marasović, Noah A. Smith, and Matt Gardner. 2019. Quoref: A reading comprehension dataset with questions re- quiring coreferential reasoning.  ... 
arXiv:2105.12392v2 fatcat:jx3mpzdipzhqdk4mwleirekpi4

To Test Machine Comprehension, Start by Defining Comprehension [article]

Jesse Dunietz, Gregory Burnham, Akash Bharadwaj, Owen Rambow, Jennifer Chu-Carroll, David Ferrucci
2020 arXiv   pre-print
Many tasks aim to measure machine reading comprehension (MRC), often focusing on question types presumed to be difficult.  ...  Second, we present a detailed definition of comprehension -- a "Template of Understanding" -- for a widely useful class of texts, namely short narratives.  ...  Liu, Ana Marasovic, Noah A. Smith, and Matt Gardner. 2019. Quoref: A reading comprehension dataset with questions re- quiring coreferential reasoning.  ... 
arXiv:2005.01525v2 fatcat:up64nyldcjganp2hbhw6cnocya

CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP [article]

Qinyuan Ye, Bill Yuchen Lin, Xiang Ren
2021 arXiv   pre-print
NLP datasets and converted to a unified text-to-text format.  ...  Humans can learn a new language task efficiently with only few examples, by leveraging their knowledge obtained when learning prior tasks.  ...  Acknowledgments We thank authors and crowd-workers of all datasets used in our study. We thank huggingface datasets team for making datasets more accessible.  ... 
arXiv:2104.08835v2 fatcat:xnhrmmsmyzb4fjo7ealrw2vnka

SpartQA: : A Textual Question Answering Benchmark for Spatial Reasoning [article]

Roshanak Mirzaee, Hossein Rajaby Faghihi, Qiang Ning, Parisa Kordjmashidi
2021 arXiv   pre-print
This paper proposes a question-answering (QA) benchmark for spatial reasoning on natural language text which contains more realistic spatial phenomena not covered by prior work and is challenging for state-of-the-art  ...  Specifically, we design grammar and reasoning rules to automatically generate a spatial description of visual scenes and corresponding QA pairs.  ...  Liu, Ana Marasović, Noah A. Smith, and Matt Gardner. 2019. Quoref: A reading comprehension dataset with questions re- quiring coreferential reasoning.  ... 
arXiv:2104.05832v1 fatcat:et2jdbr5tjh45hgkekyk74tify

Coreferential Reasoning Learning for Language Representation

Deming Ye, Yankai Lin, Jiaju Du, Zhenghao Liu, Peng Li, Maosong Sun, Zhiyuan Liu
2020 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)   unpublished
The experimental results show that, compared with existing baseline models, CorefBERT can achieve significant improvements consistently on various downstream NLP tasks that require coreferential reasoning  ...  To address this issue, we present CorefBERT, a novel language representation model that can capture the coreferential relations in context.  ...  We first evaluate models on Questions Requiring Coreferential Reasoning dataset (QUOREF) (Dasigi et al., 2019).  ... 
doi:10.18653/v1/2020.emnlp-main.582 fatcat:m3g3glueirh5xmg4gxmgwfia34

What to Pre-Train on? Efficient Intermediate Task Selection [article]

Clifton Poth, Jonas Pfeiffer, Andreas Rücklé, Iryna Gurevych
2021 arXiv   pre-print
We experiment with a diverse set of 42 intermediate and 11 target English classification, multiple choice, question answering, and sequence tagging tasks.  ...  With an abundance of candidate datasets as well as pre-trained language models, it has become infeasible to run the cross-product of all combinations to find the best transfer setting.  ...  We thank Leonardo Ribeiro and the anonymous reviewers for insightful feedback and suggestions on a draft of this paper.  ... 
arXiv:2104.08247v2 fatcat:4ljcfshev5f3tmgugrrrkh3s4m
« Previous Showing results 1 — 15 out of 30 results