89 Hits in 1.9 sec

ELI5: Long Form Question Answering

Angela Fan, Yacine Jernite, Ethan Perez, David Grangier, Jason Weston, Michael Auli
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
We introduce the first large-scale corpus for long-form question answering, a task requiring elaborate and in-depth answers to openended questions.  ...  Compared to existing datasets, ELI5 comprises diverse questions requiring multi-sentence answers. We provide a large set of web documents to help answer the question.  ...  ELI5 contains long-form answers with an average length of 6.6 sentences, or 130 words.  ... 
doi:10.18653/v1/p19-1346 dblp:conf/acl/FanJPGWA19 fatcat:jpnqlphrt5d7fdidry5cqx4usy

Hurdles to Progress in Long-form Question Answering [article]

Kalpesh Krishna, Aurko Roy, Mohit Iyyer
2021 arXiv   pre-print
The task of long-form question answering (LFQA) involves retrieving documents relevant to a given question and using them to generate a paragraph-length answer.  ...  ELI5 contains significant train / validation overlap, as at least 81% of ELI5 validation questions occur in paraphrased form in the training set; (3) ROUGE-L is not an informative metric of generated answer  ...  Longer outputs get higher ROUGE-L A summary of the major hurdles (a-d) to progress in long-form question answering with ELI5.  ... 
arXiv:2103.06332v2 fatcat:gdogrseicrbefbshlfm7nnv5zy

How Do We Answer Complex Questions: Discourse Structure of Long-form Answers [article]

Fangyuan Xu, Junyi Jessy Li, Eunsol Choi
2022 arXiv   pre-print
To better understand this complex and understudied task, we study the functional structure of long-form answers collected from three datasets, ELI5, WebGPT and Natural Questions.  ...  Long-form answers, consisting of multiple sentences, can provide nuanced and comprehensive answers to a broader set of questions.  ...  information presented in a long-form answer.  ... 
arXiv:2203.11048v1 fatcat:dejy4jem3fantbna2qomvaltiy

New Methods Metrics for LFQA tasks [article]

Suchismit Mahapatra, Vladimir Blagojevic, Pablo Bertorello, Prasanna Kumar
2021 arXiv   pre-print
Long-form question answering (LFQA) tasks require retrieving the documents pertinent to a query, using them to form a paragraph-length answer.  ...  Despite considerable progress in LFQA modeling, fundamental issues impede its progress: i) train/validation/test dataset overlap, ii) absence of automatic metrics and iii) generated answers not being "  ...  Enter long-form question answering (LFQA), which remains a fundamental challenge in natural language processing (NLP).  ... 
arXiv:2112.13432v1 fatcat:neirmp7a2rcsvcslme6wujtqeu

GooAQ: Open Question Answering with Diverse Answer Types [article]

Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh Hajishirzi, Chris Callison-Burch
2021 arXiv   pre-print
This yields a rich space of answer types, containing both textual answers (short and long) as well as more structured ones such as collections.  ...  While day-to-day questions come with a variety of answer types, the current question-answering (QA) literature has failed to adequately address the answer diversity of questions.  ...  One notable QA dataset with long-form responses is ELI5 (Fan et al., 2019; Krishna et al., 2021) , containing questions/answers mined from Reddit forums.  ... 
arXiv:2104.08727v2 fatcat:zloaxrwk2re47afc7luqacf5my

WebGPT: Browser-assisted question-answering with human feedback [article]

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe (+6 others)
2022 arXiv   pre-print
We fine-tune GPT-3 to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web.  ...  We train and evaluate our models on ELI5, a dataset of questions asked by Reddit users.  ...  For both demonstrations and comparisons, the vast majority of questions were taken from ELI5 [Fan et al., 2019] , a dataset of long-form questions.  ... 
arXiv:2112.09332v2 fatcat:qmzgb4x6fnfynhewen4bor6wvu

Teaching language models to support answers with verified quotes [article]

Jacob Menick, Maja Trebacz, Vladimir Mikulik, John Aslanides, Francis Song, Martin Chadwick, Mia Glaese, Susannah Young, Lucy Campbell-Gillingham, Geoffrey Irving, Nat McAleese
2022 arXiv   pre-print
We measure the performance of GopherCite by conducting human evaluation of answers to questions in a subset of the NaturalQuestions and ELI5 datasets.  ...  Recent large language models often answer factual questions correctly.  ...  " and "long-answer" fields.  ... 
arXiv:2203.11147v1 fatcat:xcyia7pag5ayxmbnhbvjkzyrc4

Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs

Angela Fan, Claire Gardent, Chloé Braud, Antoine Bordes
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
For two generative tasks with very long text input, long-form question answering and multidocument summarization, feeding graph representations as input can achieve better performance than using retrieved  ...  Lattice cnns for matching based chinese question answering. arXiv preprint arXiv:1902.09087. Smith. 2018a.  ...  Eli5: Long form question answering. In Proceedings of ACL 2019. Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hi- erarchical neural story generation. arXiv preprint arXiv:1805.04833.  ... 
doi:10.18653/v1/d19-1428 dblp:conf/emnlp/FanGBB19 fatcat:u4uju6ob7rcnvd3mqcavkgc6ey

Using Local Knowledge Graph Construction to Scale Seq2Seq Models to Multi-Document Inputs [article]

Angela Fan, Claire Gardent, Chloe Braud, Antoine Bordes
2019 arXiv   pre-print
For two generative tasks with very long text input, long-form question answering and multi-document summarization, feeding graph representations as input can achieve better performance than using retrieved  ...  Query-based open-domain NLP tasks require information synthesis from long and diverse web results.  ...  Eli5: Long form question answering. In Proceedings of ACL 2019. Angela Fan, Mike Lewis, and Yann Dauphin. 2018. Hi- erarchical neural story generation. In ACL.  ... 
arXiv:1910.08435v1 fatcat:2epldhegyfbsxmqtb5jngtfgra

Improving Conditioning in Context-Aware Sequence to Sequence Models [article]

Xinyi Wang, Jason Weston, Michael Auli, Yacine Jernite
2019 arXiv   pre-print
In this work, we focus on cases where generation is conditioned on both a short query and a long context, such as abstractive question answering or document-level translation.  ...  ELI5 : ELI5 Long Form Question Answering We first apply our approach to the recently published ELI5 dataset (Fan et al. 2019) for LFQA.  ...  We apply our approach to three context-aware seq2seq tasks: neural machine translation with document-level context, long form question answering, where the system needs to provide a paragraph-length answer  ... 
arXiv:1911.09728v1 fatcat:ud5cahaphndcdoag4i5ozmsvm4

Discourse Comprehension: A Question Answering Framework to Represent Sentence Connections [article]

Wei-Jen Ko, Cutter Dalton, Mark Simmons, Eliza Fisher, Greg Durrett, Junyi Jessy Li
2022 arXiv   pre-print
DCQA captures both discourse and semantic links between sentences in the form of free-form, open-ended questions.  ...  all) requires high cognitive load for annotators over long documents.  ...  The most related dataset we found is ELI5 (Fan et al., 2019) , a dataset for long-form question answering.  ... 
arXiv:2111.00701v2 fatcat:5mklcwac7fg35nk2r4cc5thatq

CCQA: A New Web-Scale Question Answering Dataset for Model Pre-Training [article]

Patrick Huber, Armen Aghajanyan, Barlas Oğuz, Dmytro Okhonko, Wen-tau Yih, Sonal Gupta, Xilun Chen
2022 arXiv   pre-print
In our experiments, we find that pre-training question-answering models on our Common Crawl Question Answering dataset (CCQA) achieves promising results in zero-shot, low resource and fine-tuned settings  ...  With the rise of large-scale pre-trained language models, open-domain question-answering (ODQA) has become an important research topic in NLP.  ...  ELI5, introduced by Fan et al. ( 2019 ), constitutes the first large-scale long-form dataset for open-ended question-answering.  ... 
arXiv:2110.07731v2 fatcat:a7iutxmskzhy3hnsk74g7zu5rm

KILT: a Benchmark for Knowledge Intensive Language Tasks [article]

Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rocktäschel (+1 others)
2021 arXiv   pre-print
Challenging problems such as open-domain question answering, fact checking, slot filling and entity linking require access to large, external knowledge sources.  ...  We find that a shared dense vector index coupled with a seq2seq model is a strong baseline, outperforming more tailor-made approaches for fact checking, open-domain question answering and dialogue, and  ...  ELI5: long form question answering.  ... 
arXiv:2009.02252v4 fatcat:44gk5nyhrvckriymi7wsrxze7m

The Web Is Your Oyster – Knowledge-Intensive NLP against a Very Large Web Corpus [article]

Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Dmytro Okhonko and Samuel Broscheit and Gautier Izacard and Patrick Lewis and Barlas Oğuz and Edouard Grave and Wen-tau Yih and Sebastian Riedel
2021 arXiv   pre-print
Hurdles to progress in long-form question answer- ing.  ...  ELI5: Sebastian Borgeaud, Arthur Mensch, Jordan Hoff- Long form question answering.  ... 
arXiv:2112.09924v1 fatcat:khcg2qe2trho3b7useq5navfye

Knowledge Infused Decoding [article]

Ruibo Liu, Guoqing Zheng, Shashank Gupta, Radhika Gaonkar, Chongyang Gao, Soroush Vosoughi, Milad Shokouhi, Ahmed Hassan Awadallah
2022 arXiv   pre-print
Our experiments find baseline methods tend to generate off-topic and hallucinatory answers when the expected answer length is long (e.g., ELI5 and PIQA).  ...  We first sample 200 ELI5 test set questions and generate answers of various lengths {80, 100, ..., 260} (260 is the average sequence length in training set) with beam search, sampling, reflective (West  ...  Answer is from 1-not similar at all to 7-very much similar).  ... 
arXiv:2204.03084v1 fatcat:t3r6mutr7nbflg7uovqo2waapi
« Previous Showing results 1 — 15 out of 89 results