Filters








1,528 Hits in 6.1 sec

Cross-Lingual Training with Dense Retrieval for Document Retrieval [article]

Peng Shi, Rui Zhang, He Bai, Jimmy Lin
2021 arXiv   pre-print
However, its effectiveness in document retrieval for non-English languages remains unexplored due to the limitation in training resources.  ...  Dense retrieval has shown great success in passage ranking in English.  ...  Wikipedia has documents in varies languages, and it is a good transfer set in the cross-lingual training.  ... 
arXiv:2109.01628v1 fatcat:ogrosnckcjcjlco3hdb4cqsrwe

Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains [article]

Alon Albalak, Sharon Levy, William Yang Wang
2022 arXiv   pre-print
In this paper, we demonstrate a cross-lingual open-retrieval question answering system for the emergent domain of COVID-19.  ...  We show that a deep semantic retriever greatly benefits from training on our English-to-all data and significantly outperforms a BM25 baseline in the cross-lingual setting.  ...  ] 26 Jan 2022 Cross-Lingual Dense Retrieval Training a dense retriever is challenging in lowresource settings, such as emergent domains, due to the data-hungry nature of large language models.  ... 
arXiv:2201.11153v1 fatcat:ion7747l4vb6nmkwnk3hsul76e

DuReader_retrieval: A Large-scale Chinese Benchmark for Passage Retrieval from Web Search Engine [article]

Yifu Qiu, Hongyu Li, Yingqi Qu, Ying Chen, Qiaoqiao She, Jing Liu, Hua Wu, Haifeng Wang
2022 arXiv   pre-print
Additionally, we provide two out-of-domain testing sets for cross-domain evaluation, as well as a cross-lingual set that has been manually translated for cross-lingual retrieval.  ...  These experimental results show that the dense retriever does not generalize well across domains, and cross-lingual retrieval is essentially challenging.  ...  ., 2018) are two retrieval datasets for the news documents and web-pages, separately. Dense Retrieval Model.  ... 
arXiv:2203.10232v3 fatcat:hrjvgnejpzb4pky4xzc5the4hm

MIA 2022 Shared Task Submission: Leveraging Entity Representations, Dense-Sparse Hybrids, and Fusion-in-Decoder for Cross-Lingual Question Answering [article]

Zhucheng Tu, Sarguna Janani Padmanabhan
2022 arXiv   pre-print
We describe our two-stage system for the Multilingual Information Access (MIA) 2022 Shared Task on Cross-Lingual Open-Retrieval Question Answering.  ...  The first stage consists of multilingual passage retrieval with a hybrid dense and sparse retrieval strategy.  ...  We thank Yinfei Yang, Wei Wang, and Jinhao Lei for their insightful discussions and feedback on early versions of the paper.  ... 
arXiv:2207.01940v3 fatcat:zvl27rhdtbhgnp54p3wruowbga

Unsupervised Dense Information Retrieval with Contrastive Learning [article]

Gautier Izacard and Mathilde Caron and Lucas Hosseini and Sebastian Riedel and Piotr Bojanowski and Armand Joulin and Edouard Grave
2022 arXiv   pre-print
We show that our unsupervised models can perform cross-lingual retrieval between different scripts, such as retrieving English documents from Arabic queries, which would not be possible with term matching  ...  Finally, we evaluate our approach for multi-lingual retrieval, where training data is even scarcer than for English, and show that our approach leads to strong unsupervised performance.  ...  Mr. tydi: A multi-lingual benchmark for dense retrieval. CoRR, abs/2108.08787, 2021.  ... 
arXiv:2112.09118v4 fatcat:arzmrzx76jdohlqjlchlz5nfbe

MIA 2022 Shared Task: Evaluating Cross-lingual Open-Retrieval Question Answering for 16 Diverse Languages [article]

Akari Asai, Shayne Longpre, Jungo Kasai, Chia-Hsuan Lee, Rui Zhang, Junjie Hu, Ikuya Yamada, Jonathan H. Clark, Eunsol Choi
2022 arXiv   pre-print
The second best system uses entity-aware contextualized representations for document retrieval, and achieves significant improvements in Tamil (20.8 F1), whereas most of the other systems yield nearly  ...  In this task, we adapted two large-scale cross-lingual open-retrieval QA datasets in 14 typologically diverse languages, and newly annotated open-retrieval QA data in 2 underrepresented languages: Tagalog  ...  We thank GENGO translators to translate questions into Tamil and Tagalog. we thank the EvalAI team, particularly Ram Ramrakhya, for their help with hosting the shared task submission site.  ... 
arXiv:2207.00758v1 fatcat:upipd6sp5vhcjfmmmjcmpmaqtm

Unsupervised Context Aware Sentence Representation Pretraining for Multi-lingual Dense Retrieval [article]

Ning Wu, Yaobo Liang, Houxing Ren, Linjun Shou, Nan Duan, Ming Gong, Daxin Jiang
2022 arXiv   pre-print
Recent research demonstrates the effectiveness of using pretrained language models (PLM) to improve dense retrieval and multilingual dense retrieval.  ...  Besides, a post-processing for sentence embedding is also very effective to achieve better retrieval performance.  ...  TYDI dataset is a multi-lingual benchmark dataset for mono-lingual query passage retrieval in eleven typologically diverse languages, designed to evaluate ranking with learned dense representations.  ... 
arXiv:2206.03281v1 fatcat:yj3aqhzxnvgmbn5ynmgm7dtsre

Towards Best Practices for Training Multilingual Dense Retrieval Models [article]

Xinyu Zhang, Kelechi Ogueji, Xueguang Ma, Jimmy Lin
2022 arXiv   pre-print
Our study is organized as a "best practices" guide for training multilingual dense retrieval models, broken down into three main scenarios: where a multilingual transformer is available, but relevance  ...  Although recent work with multilingual transformers demonstrates that they exhibit strong cross-lingual generalization capabilities, there remain many open research questions, which we tackle here.  ...  Recommendation: Training a dense retrieval model with an English backbone, even for a non-English target language, can be effective.  ... 
arXiv:2204.02363v1 fatcat:qeqf3klwi5brjauxsepcgff3gq

Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation [article]

Shengyao Zhuang, Houxing Ren, Linjun Shou, Jian Pei, Ming Gong, Guido Zuccon, Daxin Jiang
2022 arXiv   pre-print
This problem is further exacerbated when using DSI for cross-lingual retrieval, where document text and query text are in different languages.  ...  Empirical results on popular mono-lingual and cross-lingual passage retrieval benchmark datasets show that DSI-QG significantly outperforms the original DSI model.  ...  We train the DPR model with the Tevatron dense retriever training toolkit ). • DSI (Tay et al., 2022) : The original DSI method that uses document texts as input for indexing.  ... 
arXiv:2206.10128v1 fatcat:z6m25kz4gvfzznupmzbnsndzoe

Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval [article]

Robert Litschko and Ivan Vulić and Goran Glavaš
2022 arXiv   pre-print
transfer to multilingual and cross-lingual retrieval tasks.  ...  In this work, we show that two parameter-efficient approaches to cross-lingual transfer, namely Sparse Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more effective zero-shot  ...  We demonstrate that this leads to more effective transfer to cross-lingual IR setups as well as to better cross-lingual transfer for monolingual retrieval in target languages with no relevance judgment  ... 
arXiv:2204.02292v2 fatcat:i47skrj7ovdbhgnobcq3f4as44

One Question Answering Model for Many Languages with Cross-lingual Dense Passage Retrieval [article]

Akari Asai, Xinyan Yu, Jungo Kasai, Hannaneh Hajishirzi
2021 arXiv   pre-print
We introduce a new dense passage retrieval algorithm that is trained to retrieve documents across languages for a question.  ...  We present Cross-lingual Open-Retrieval Answer Generation (CORA), the first unified many-to-many question answering (QA) model that can answer questions across many languages, even for ones without language-specific  ...  We thank anonymous reviewers, area chairs, Eunsol Choi, Sewon Min, David Wadden, and the members of the UW NLP group for their insightful feedback on this paper, and Gabriel Ilharco for his help on human  ... 
arXiv:2107.11976v2 fatcat:fhlc373mpfc3bcthq7g72vssem

Learning Translational and Knowledge-based Similarities from Relevance Rankings for Cross-Language Retrieval

Shigehiko Schamoni, Felix Hieber, Artem Sokolov, Stefan Riezler
2014 Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)  
In large-scale experiments for patent prior art search and cross-lingual retrieval in Wikipedia, our approach yields considerable improvements over learningto-rank with either only dense or only sparse  ...  We present an approach to cross-language retrieval that combines dense knowledgebased features and sparse word translations.  ...  Acknowledgments This research was supported in part by DFG grant RI-2221/1-1 "Cross-language Learning-to-Rank for Patent Retrieval".  ... 
doi:10.3115/v1/p14-2080 dblp:conf/acl/SchamoniHSR14 fatcat:fk42avwtpje2pmt76sw276vc7e

Learning to Enrich Query Representation with Pseudo-Relevance Feedback for Cross-lingual Retrieval

Ramraj Chandradevan, Eugene Yang, Mahsa Yarmohammadi, Eugene Agichtein
2022 Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval  
Recent pre-trained multilingual language models brought large improvements to the natural language tasks, including cross-lingual adhoc retrieval.  ...  Cross-lingual information retrieval (CLIR) aims to provide access to information across languages.  ...  INTRODUCTION With the recent advances in neural information retrieval models, cross-lingual information retrieval (CLIR) adapts such trends with increasing use of multilingual pretrained models such as  ... 
doi:10.1145/3477495.3532013 fatcat:uee56yexcrc75cmdypw3liffom

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval [article]

Xinyu Zhang, Xueguang Ma, Peng Shi, Jimmy Lin
2021 arXiv   pre-print
TyDi, a multi-lingual benchmark dataset for mono-lingual retrieval in eleven typologically diverse languages, designed to evaluate ranking with learned dense representations.  ...  In addition to analyses of our results, we also discuss future challenges and present a research agenda in multi-lingual dense retrieval. Mr.  ...  lingual retrieval in non-English languages (e.g., Bengali queries against Bengali documents) rather than cross-lingual retrieval, where documents and queries are in different languages (e.g., English queries  ... 
arXiv:2108.08787v2 fatcat:4ivjwnehcbayxpmoin3y4evgvy

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning [article]

Sean MacAvaney, Luca Soldaini, Nazli Goharian
2019 arXiv   pre-print
In this paper, we tackle the lack of data by leveraging pre-trained multilingual language models to transfer a retrieval system trained on English collections to non-English queries and documents.  ...  Our model is evaluated in a zero-shot setting, meaning that we use them to predict relevance scores for query-document pairs in languages never seen during training.  ...  For all three years, the document collection was kept in English. CLEF also hosted multiple cross-lingual ad-hoc retrieval tasks from 2000 to 2009 [3] .  ... 
arXiv:1912.13080v1 fatcat:3fsqiservbbcvfrrwg4krubrzu
« Previous Showing results 1 — 15 out of 1,528 results