7,888 Hits in 4.9 sec

Distilling Knowledge from Reader to Retriever for Question Answering [article]

Gautier Izacard, Edouard Grave
2022 arXiv   pre-print
In this paper, we propose a technique to learn retriever models for downstream tasks, inspired by knowledge distillation, and which does not require annotated pairs of query and documents.  ...  Our approach leverages attention scores of a reader model, used to solve the task based on retrieved documents, to obtain synthetic labels for the retriever.  ...  For example, many real world question answering systems start by retrieving a set of support documents from a large source of knowledge such as Wikipedia.  ... 
arXiv:2012.04584v2 fatcat:brb7ftiayfakdpmmxzjqyyok6m

COBERT: COVID-19 Question Answering System Using BERT

Jafar A. Alzubi, Rachna Jain, Anubhav Singh, Pritee Parwekar, Meenu Gupta
2021 Arabian Journal for Science and Engineering  
Taking these challenges into account we have proposed COBERT: a retriever-reader dual algorithmic system that answers the complex queries by searching a document of 59K corona virus-related literature  ...  The reader which is pre-trained Bidirectional Encoder Representations from Transformers (BERT) on SQuAD 1.1 dev dataset built on top of the HuggingFace BERT transformers, refines the sentences from the  ...  Ranker Distill: distillation is used for search knowledge, 2. LM Distill + Fine-tuning: distillation is serviced for LM knowledge, and 3.  ... 
doi:10.1007/s13369-021-05810-5 pmid:34178569 pmcid:PMC8220121 fatcat:dx4dkreeaba57gzayjxbnsinoy

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering [article]

Sohee Yang, Minjoon Seo
2021 arXiv   pre-print
In open-domain question answering (QA), retrieve-and-read mechanism has the inherent benefit of interpretability and the easiness of adding, removing, or editing knowledge compared to the parametric approaches  ...  However, it is also known to suffer from its large storage footprint due to its document corpus and index.  ...  To train the uni- the backbone from BERT to MobileBERT (§2.2.1) fied retriever-reader through knowledge distillation, results in a significant relative performance drop a batch  ... 
arXiv:2104.07242v2 fatcat:enlrajzwhnhszp5juwvbxzre2i

Learning Dense Representations of Phrases at Scale [article]

Jinhyuk Lee, Mujeen Sung, Jaewoo Kang, Danqi Chen
2021 arXiv   pre-print
Open-domain question answering can be reformulated as a phrase retrieval problem, without the need for processing documents on-demand during inference (Seo et al., 2019).  ...  Our model is easy to parallelize due to pure dense representations and processes more than 10 questions per second on CPUs.  ...  The retriever-reader approach retrieves a small number of relevant documents or passages from which the answers are extracted.  ... 
arXiv:2012.12624v3 fatcat:vjszqzioyng5djmut33b4fkm6y

A COVID-19 Search Engine (CO-SE) with Transformer-based Architecture [article]

Shaina Raza
2022 arXiv   pre-print
It also consists of a reader component that consists of a Transformer-based model, which is used to read the paragraphs and find the answers related to the query from the retrieved documents.  ...  The CO-SE has a retriever component trained on the TF-IDF vectorizer that retrieves the relevant documents from the system.  ...  Acknowledgments I would like to acknowledge that this research and manuscript is a part of my CIHR Health Systems Impact Fellowship.  ... 
arXiv:2206.03474v1 fatcat:gyqvy3tl7bbfhgvhk7kknmqiou

Relation-Guided Pre-Training for Open-Domain Question Answering [article]

Ziniu Hu, Yizhou Sun, Kai-Wei Chang
2021 arXiv   pre-print
We then pre-train a QA model to infer the latent relations from the question, and then conduct extractive QA to get the target answer entity.  ...  Answering complex open-domain questions requires understanding the latent relations between involving entities.  ...  Template- based question generation from retrieved sentences for improved unsupervised question answering.  ... 
arXiv:2109.10346v1 fatcat:vc7kawiku5b5heh7enlkgr2zia

DC-BERT: Decoupling Question and Document for Efficient Contextual Encoding [article]

Yuyu Zhang, Ping Nie, Xiubo Geng, Arun Ramamurthy, Le Song, Daxin Jiang
2020 arXiv   pre-print
open-domain question answering.  ...  On SQuAD Open and Natural Questions Open datasets, DC-BERT achieves 10x speedup on document retrieval, while retaining most (about 98%) of the QA performance compared to state-of-the-art approaches for  ...  reader module, making it harder to find the answer.  ... 
arXiv:2002.12591v1 fatcat:utlelcr4sjcbbg7wtikqdcl3k4

Introduction: A perspective on Web Information Retrieval

Massimo Melucci, David Hawking
2006 Information retrieval (Boston)  
Acknowledgements We thank the current Editors-in-Chief and Stephen Robertson, for support and the advice. We also thank the colleagues who helped select the papers: Maristella  ...  a methodology to extract information to answer questions similar to those typical of a Frequently Asked Questions list.  ...  We invite readers to also read papers from other special issues devoted to Web information retrieval and mining which may be seen to complement this issue.  ... 
doi:10.1007/s10791-006-7145-2 fatcat:db2kuzmb5vebnftd3jeowhzeve

Decoupled Transformer for Scalable Inference in Open-domain Question Answering [article]

Haytham ElFadeel, Stan Peshterliev
2021 arXiv   pre-print
To retain the decoupled transformer accuracy, we devised a knowledge distillation objective from a standard transformer model.  ...  Large transformer models, such as BERT, achieve state-of-the-art results in machine reading comprehension (MRC) for open-domain question answering (QA).  ...  Introduction Open-domain question answering (QA) aims to answer questions from a collection of text passages.  ... 
arXiv:2108.02765v1 fatcat:xy2fo5o4g5f3hdawmp4aij2ewe

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering [article]

Devendra Singh Sachan and Siva Reddy and William Hamilton and Chris Dyer and Dani Yogatama
2021 arXiv   pre-print
We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers  ...  This results in a retriever that is able to select more relevant documents for a question and a reader that is trained on more accurate documents to generate an answer.  ...  Introduction Open-domain question answering (OpenQA) is a question answering task where the goal is to train a language model to produce an answer for a given question.  ... 
arXiv:2106.05346v2 fatcat:35iobifkqvb5tjmcn7njkcchae

Re2G: Retrieve, Rerank, Generate [article]

Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Ankita Rajaram Naik, Pengshan Cai, Alfio Gliozzo
2022 arXiv   pre-print
To train our system end-to-end, we introduce a novel variation of knowledge distillation to train the initial retrieval, reranker, and generation using only ground truth on the target sequence output.  ...  We find large gains in four diverse tasks: zero-shot slot filling, question answering, fact-checking, and dialog, with relative gains of 9% to 34% over the previous state-of-the-art on the KILT leaderboard  ...  Online knowledge distillation failed to improve for Wizard of Wikipedia and ensembling with BM25 failed to improve for Natural Questions.  ... 
arXiv:2207.06300v1 fatcat:feqv35tzd5gatlzkm2ulo5ciuy

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned [article]

Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts (+41 others)
2021 arXiv   pre-print
The competition focused on open-domain question answering (QA), where systems take natural language questions as input and return natural language answers.  ...  These memory budgets were designed to encourage contestants to explore the trade-off between storing retrieval corpora or the parameters of learned models.  ...  Acknowledgments We thank all the participants for taking part and making this a successful competition. We thank Google for providing prizes for computer participants.  ... 
arXiv:2101.00133v2 fatcat:i3pwpxnarzeghoo7ggjlqk57xm

Pruning the Index Contents for Memory Efficient Open-Domain QA [article]

Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz
2021 arXiv   pre-print
Specifically, it proposes the novel R2-D2 (Rank twice, reaD twice) pipeline composed of retriever, passage reranker, extractive reader, generative reader and a simple way to combine them.  ...  This work presents a simple approach for pruning the contents of a massive index such that the open-domain QA system altogether with index, OS, and library components fits into 6GiB docker image while  ...  Acknowledgments We would like to thank Jan Doležal for implementing an R2-D2 demo.  ... 
arXiv:2102.10697v2 fatcat:rbkfghoainb37asci2etnhyzse

KG-FiD: Infusing Knowledge Graph in Fusion-in-Decoder for Open-Domain Question Answering [article]

Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, Michael Zeng
2022 arXiv   pre-print
Given an input question, the reading module predicts the answer from the relevant passages which are retrieved by the retriever.  ...  We initiate the passage node embedding from the FiD encoder and then use graph neural network (GNN) to update the representation for reranking.  ...  Acknowledgements We thank all the reviewers for their valuable comments. We also thank Woojeong Jin, Dong-Ho Lee, and Aaron Chan for useful discussions.  ... 
arXiv:2110.04330v2 fatcat:3j5yojxp5vgb7gffxuycgbinrq

Few-shot Learning with Retrieval Augmented Language Models [article]

Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave
2022 arXiv   pre-print
However, when knowledge is key for such results, as is the case for tasks such as question answering and fact checking, massive parameter counts to store knowledge seem to be needed.  ...  Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings.  ...  text occurs more frequently in retrieved passages, rising from 55% for questions when the answer option does not appear, to 77% for questions mentioned more than 15 times.  ... 
arXiv:2208.03299v2 fatcat:yr23jj67srhmfi2riztlie4yx4
« Previous Showing results 1 — 15 out of 7,888 results