Filters








2,496 Hits in 5.8 sec

Context-Aware Sentence/Passage Term Importance Estimation For First Stage Retrieval [article]

Zhuyun Dai, Jamie Callan
2019 arXiv   pre-print
This paper proposes a Deep Contextualized Term Weighting framework that learns to map BERT's contextualized text representations to context-aware term weights for sentences and passages.  ...  When applied to passages, DeepCT-Index produces term weights that can be stored in an ordinary inverted index for passage retrieval.  ...  Our results show that a deep, contextualized neural language model is able to capture some of the desired properties, and can be used to generate effective term weights for passage indexing.  ... 
arXiv:1910.10687v2 fatcat:sdae46aknvfldby52xaj2f53la

BERT-based Dense Intra-ranking and Contextualized Late Interaction via Multi-task Learning for Long Document Retrieval

Minghan Li, Eric Gaussier
2022 Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval  
In contrast, dense retrieval methods in representation-based approaches are known to be efficient, however less effective.  ...  effective interaction while preserving efficiency.  ...  ACKNOWLEDGMENTS This work has been partially supported by MIAI@Grenoble Alpes (ANR-19-P3IA-0003) and the Chinese Scholarship Council (CSC) grant No.201906960018.  ... 
doi:10.1145/3477495.3531856 fatcat:z3a7oo2lb5fdzayi4ixgftwwc4

VisualSparta: An Embarrassingly Simple Approach to Large-scale Text-to-Image Search with Weighted Bag-of-words [article]

Xiaopeng Lu, Tiancheng Zhao, Kyusong Lee
2021 arXiv   pre-print
Text-to-image retrieval is an essential task in cross-modal information retrieval, i.e., retrieving relevant images from a large and unlabelled dataset given textual queries.  ...  We also show that it achieves substantial retrieving speed advantages, i.e., for a 1 million image index, VisualSparta using CPU gets ~391X speedup compared to CPU vector search and ~5.4X speedup compared  ...  bag-of-words is shown to be an effective representation for cross-modal retrieval that can be efficiently indexed in an Inverted Index for fast retrieval. (3) Detailed analysis and ablation study that  ... 
arXiv:2101.00265v2 fatcat:6w6fxgaas5eddd4imrkfofgspe

Semantic Models for the First-stage Retrieval: A Comprehensive Review [article]

Yinqiong Cai, Yixing Fan, Jiafeng Guo, Fei Sun, Ruqing Zhang, Xueqi Cheng
2021 arXiv   pre-print
methods and neural semantic retrieval methods.  ...  Therefore, it has been a long-term desire to build semantic models for the first-stage retrieval that can achieve high recall efficiently.  ...  Based on this, it can achieve cheap interaction and high-efficient pruning for top-relevant documents retrieval.  ... 
arXiv:2103.04831v3 fatcat:6qa7hvc3jve3pcmo2mo4qsiefq

Pre-trained Language Model based Ranking in Baidu Search [article]

Lixin Zou, Shengqiang Zhang, Hengyi Cai, Dehong Ma, Suqi Cheng, Daiting Shi, Zhifan Zhu, Weiyue Su, Shuaiqiang Wang, Zhicong Cheng, Dawei Yin
2021 arXiv   pre-print
objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system;(3) a real-world search engine typically involves  ...  We first articulate a novel practice to cost-efficiently summarize the web document and contextualize the resultant summary content with the query using a cheap yet powerful Pyramid-ERNIE architecture.  ...  Then, the Pyramid-ERNIE captures the comprehensive query-document relevance using contextualized interactions over the previously generated representations for the sake of balancing the efficiency-effectiveness  ... 
arXiv:2105.11108v3 fatcat:dbvj65ugovaani4hsiwtl6bcdi

Fast Passage Re-ranking with Contextualized Exact Term Matching and Efficient Passage Expansion [article]

Shengyao Zhuang, Guido Zuccon
2021 arXiv   pre-print
This however is at the expense of a lower effectiveness compared to other BERT-based re-rankers and dense retrievers.  ...  BERT-based information retrieval models are expensive, in both time (query latency) and computational resources (energy, hardware cost), making many of these models impractical especially under resource  ...  Dense Retrievers: We also consider dense retrievers, and specifically RepBERT [44] and ANCE [41] , as means of very efficient neural methods for retrieval.  ... 
arXiv:2108.08513v2 fatcat:ldygq7crijbsdlt62hxfu5elke

RepBERT: Contextualized Text Embeddings for First-Stage Retrieval [article]

Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Min Zhang, Shaoping Ma
2020 arXiv   pre-print
On MS MARCO Passage Ranking task, RepBERT achieves state-of-the-art results among all initial retrieval techniques. And its efficiency is comparable to bag-of-words methods.  ...  The inner products of query and document embeddings are regarded as relevance scores.  ...  Typically, efficient bag-of-words models are often adopted for initial retrieval, and neural ranking models are utilized for reranking.  ... 
arXiv:2006.15498v2 fatcat:ob2jvmanabfchfajgs6w4o2k2y

Modularized Transfomer-based Ranking Framework [article]

Luyu Gao, Zhuyun Dai, Jamie Callan
2020 arXiv   pre-print
We show how this design enables substantially faster ranking using offline pre-computed representations and light-weight online interactions.  ...  In this work, we modularize the Transformer ranker into separate modules for text representation and interaction.  ...  The authors would also like to thank Graham Neubig and Chenyan Xiong for helpful discussions and feedbacks. A.3 Datasets We use MSMARCO, ClueWeb09-b and Robust04.  ... 
arXiv:2004.13313v3 fatcat:5fhavvvaerhoppu73ebm52cjli

Embedding-based Recommender System for Job to Candidate Matching on Scale [article]

Jing Zhao, Jingya Wang, Madhav Sigdel, Bopeng Zhang, Phuong Hoang, Mengshu Liu, Mohammed Korayem
2021 arXiv   pre-print
To learn the comprehensive and effective embedding for job posts and candidates, we have constructed a fused-embedding via different levels of representation learning from raw text, semantic entities and  ...  The clusters of fused-embedding of job and candidates are then used to build and train the Faiss index that supports runtime approximate nearest neighbor search for candidate retrieval.  ...  We would also like to dedicate this paper to Bopeng to recognize his crucial contribution and achievement during his days at CareerBuilder.  ... 
arXiv:2107.00221v1 fatcat:rgj2hqndwvaoxbl3nscff4udum

Neural Ranking Models for Document Retrieval [article]

Mohamed Trabelsi, Zhiyu Chen, Brian D. Davison, Jeff Heflin
2021 arXiv   pre-print
A variety of deep learning models have been proposed, and each model presents a set of neural network components to extract features that are used for ranking.  ...  We also show the analogy between document retrieval and other retrieval tasks where the items to be ranked are structured documents, answers, images and videos.  ...  Effective user interaction for high-recall retrieval: Less is more. In Proceedings of the 27th ACM International Conference on .  ... 
arXiv:2102.11903v1 fatcat:zc2otf456rc2hj6b6wpcaaslsa

A Neural Corpus Indexer for Document Retrieval [article]

Yujing Wang, Yingyan Hou, Haonan Wang, Ziming Miao, Shibin Wu, Hao Sun, Qi Chen, Yuqing Xia, Chengmin Chi, Guoshuai Zhao, Zheng Liu, Xing Xie (+4 others)
2022 arXiv   pre-print
To this end, we propose Neural Corpus Indexer (NCI), a sequence-to-sequence network that generates relevant document identifiers directly for a designated query.  ...  Current state-of-the-art document retrieval solutions mainly follow an index-retrieve paradigm, where the index is hard to be optimized for the final retrieval target.  ...  At search time, we build representation vectors for query tokens and perform contextualized exact match to retrieve relevant documents based on inverted index.  ... 
arXiv:2206.02743v1 fatcat:g2xuxfafindqjlh27pqdzr2m5a

Multi-Layer Contextual Passage Term Embedding for Ad-Hoc Retrieval

Weihong Cai, Zijun Hu, Yalan Luo, Daoyuan Liang, Yifan Feng, Jiaxin Chen
2022 Information  
new possibilities for the long document relevance task.  ...  neural ranking models.  ...  Acknowledgments: We would like to thank the authors of all datasets used in this paper for making the data available to the community.  ... 
doi:10.3390/info13050221 fatcat:f4f7knprrfbvnhenverilndatm

Retrieval-Augmented Reinforcement Learning [article]

Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adria Puigdomenech Badia, Arthur Guez, Mehdi Mirza, Peter C. Humphreys, Ksenia Konyushkova, Laurent Sifre, Michal Valko (+4 others)
2022 arXiv   pre-print
The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently. he proposed method facilitates  ...  In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent.  ...  For retrieval to be effective, the retrieval process needs to: (1) be able to efficiently query a large dataset of trajectories, (2) learn and employ a similarity function to find relevant trajectories  ... 
arXiv:2202.08417v4 fatcat:lqirnej77neu7hdmo7sbmddg7i

Offline Evaluation and Optimization for Interactive Systems

Lihong Li
2015 Proceedings of the Eighth ACM International Conference on Web Search and Data Mining - WSDM '15  
Bing offline umecka umcka (treatments for cold symptoms) left leff (right correction) (correct word breaking) {umecka and zinc} vs.  ...  Offline Evaluation of • Online evaluation • Controlled experiments (AB tests) • Wait for days/weeks/months and compute average reward • Reliable but expensive • Offline evaluation • Use historical  ...  check offline to detect bugs Use standard t-test to detect ≠  ... 
doi:10.1145/2684822.2697040 dblp:conf/wsdm/Li15 fatcat:2ap6hcpimfh5xogmdzif6ar6ri

An Evaluation of Weakly-Supervised DeepCT in the TREC 2019 Deep Learning Track

Zhuyun Dai, Jamie Callan
2019 Text Retrieval Conference  
The weighted document is stored in an ordinary inverted index and searched using a multi-field BM25, which is efficient.  ...  It used the contextualized token embeddings generated by BERT to estimate a term's importance in passages, and combines passage term weights into document-level term weights.  ...  Any opinions, findings, and conclusions in this paper are the authors' and do not necessarily reflect those of the sponsor.  ... 
dblp:conf/trec/DaiC19 fatcat:fuuu4dntlvhwdfcijac6qftf3y
« Previous Showing results 1 — 15 out of 2,496 results