Filters








10,299 Hits in 4.9 sec

Long Document Ranking with Query-Directed Sparse Transformer [article]

Jyun-Yu Jiang, Chenyan Xiong, Chia-Jung Lee, Wei Wang
2020 arXiv   pre-print
In this paper, we design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention.  ...  The computing cost of transformer self-attention often necessitates breaking long documents to fit in pretrained models in document ranking tasks.  ...  This paper presents Query-Directed Sparse Transformer (QDS-Transformer) for long document ranking.  ... 
arXiv:2010.12683v1 fatcat:rgjxvvidpjbzdlp23ir26l43ii

Long Document Ranking with Query-Directed Sparse Transformer

Jyun-Yu Jiang, Chenyan Xiong, Chia-Jung Lee, Wei Wang
2020 Findings of the Association for Computational Linguistics: EMNLP 2020   unpublished
In this paper, we design Query-Directed Sparse attention that induces IR-axiomatic structures in transformer self-attention.  ...  The computing cost of transformer selfattention often necessitates breaking long documents to fit in pretrained models in document ranking tasks.  ...  This paper presents Query-Directed Sparse Transformer (QDS-Transformer) for long document ranking.  ... 
doi:10.18653/v1/2020.findings-emnlp.412 fatcat:32lr7p4nbvaydcunmscxpc723a

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval [article]

Minghan Li, Diana Nicoleta Popa, Johan Chagnon, Yagmur Gizem Cinar, Eric Gaussier
2021 arXiv   pre-print
We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few blocks are aggregated to form a short document that  ...  Recent works dealing with this issue include truncating long documents, in which case one loses potential relevant information, segmenting them into several passages, which may lead to miss some information  ...  As discussed above, Query-Directed Sparse Transformer (QDS-Transformer) [24] is designed with sparse local attention and global attention for long document information retrieval.  ... 
arXiv:2111.09852v2 fatcat:iu6tm4xchzcufg43rx5lykawia

How Different are Pre-trained Transformers for Text Ranking? [article]

David Rau, Jaap Kamps
2022 arXiv   pre-print
Is the gain in performance due to a better ranking of the same documents (prioritizing precision)? On the other hand, what is different?  ...  In recent years, large pre-trained transformers have led to substantial gains in performance over traditional retrieval models and feedback approaches.  ...  Another interesting direction is to enforce sparse encoding and able to relate neural ranking to sparse retrieval [18] , [6] . Although related, the work in [16] differs in two important aspects.  ... 
arXiv:2204.07233v1 fatcat:e6g44h5ptnhtfodwdvgs7kbkeq

Semantic Models for the First-stage Retrieval: A Comprehensive Review [article]

Yinqiong Cai, Yixing Fan, Jiafeng Guo, Fei Sun, Ruqing Zhang, Xueqi Cheng
2021 arXiv   pre-print
Moreover, we identify some open challenges and envision some future directions, with the hope of inspiring more researches on these important yet less investigated topics.  ...  Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents, and latter stages attempt to re-rank  ...  [184] proposed a standalone neural ranking model to learn latent sparse representation for each query and document.  ... 
arXiv:2103.04831v3 fatcat:6qa7hvc3jve3pcmo2mo4qsiefq

Pre-training Methods in Information Retrieval [article]

Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo
2022 arXiv   pre-print
Moreover, we discuss some open challenges and highlight several promising directions, with the hope of inspiring and facilitating more works on these topics for future research.  ...  Considering the rapid progress of this direction, this survey aims to provide a systematic review of pre-training methods in IR.  ...  long document matching tasks.  ... 
arXiv:2111.13853v3 fatcat:pilemnpphrgv5ksaktvctqdi4y

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking

Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant
2021 Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval  
with respect to state-ofthe-art dense and sparse methods.  ...  In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines.  ...  While BOW models remain strong baselines [27] , they suffer from the long standing vocabulary mismatch problem, where relevant documents might not contain terms that appear in the query.  ... 
doi:10.1145/3404835.3463098 fatcat:bys3mlnjx5c4rjggusnharrpzu

SPLADE: Sparse Lexical and Expansion Model for First Stage Ranking [article]

Thibault Formal, Benjamin Piwowarski, Stéphane Clinchant
2021 arXiv   pre-print
with respect to state-of-the-art dense and sparse methods.  ...  In neural Information Retrieval, ongoing research is directed towards improving the first retriever in ranking pipelines.  ...  and transform(.) is a linear layer with GeLU activation and LayerNorm.  ... 
arXiv:2107.05720v1 fatcat:qyzhnwqblzdzlebpmnn4lk4i5u

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations [article]

Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira
2021 arXiv   pre-print
In particular, Pyserini supports sparse retrieval (e.g., BM25 scoring using bag-of-words representations), dense retrieval (e.g., nearest-neighbor search on transformer-encoded representations), as well  ...  Our toolkit is self-contained as a standard Python package and comes with queries, relevance judgments, pre-built indexes, and evaluation scripts for many commonly used IR test collections.  ...  In our view, the two most important research directions are transformer-based reranking models and learned dense representations for ranking.  ... 
arXiv:2102.10073v1 fatcat:lmuels3qwfcn7hl2jyqbnb7dbq

Wacky Weights in Learned Sparse Representations and the Revenge of Score-at-a-Time Query Evaluation [article]

Joel Mackenzie, Andrew Trotman, Jimmy Lin
2021 arXiv   pre-print
Recent advances in retrieval models based on learned sparse representations generated by transformers have led us to, once again, consider score-at-a-time query evaluation techniques for the top-k retrieval  ...  In our experiments with four different retrieval models that exploit representational learning with bags of words, we find that transformers generate "wacky weights" that appear to greatly reduce the opportunities  ...  ranked disjunction, with mean latency values of Exact query evaluation (i.e., exhaustively traversing all postings) with JASS is slower than PISA, but achieves comparable effectiveness.  ... 
arXiv:2110.11540v2 fatcat:vc3yxwnju5c4vplq7uk3swd6tq

Understanding Performance of Long-Document Ranking Models through Comprehensive Evaluation and Leaderboarding [article]

Leonid Boytsov, Tianyi Lin, Fangwei Gao, Yutian Zhao, Jeffrey Huang, Eric Nyberg
2022 arXiv   pre-print
We carry out a comprehensive evaluation of 13 recent models for ranking of long documents using two popular collections (MS MARCO documents and Robust04).  ...  Our model zoo includes two specialized Transformer models (such as Longformer) that can process long documents without the need to split them.  ...  Our key finding is that ranking models capable of processing long documents, including specialized Transformers with sparse attention [4, 62] , show little improvement of encoding whole documents, compared  ... 
arXiv:2207.01262v1 fatcat:rzampfd3mbav3mxnujwuntthyq

PARADE: Passage Representation Aggregation for Document Reranking [article]

Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, Yingfei Sun
2021 arXiv   pre-print
Pretrained transformer models, such as BERT and T5, have shown to be highly effective at ad-hoc passage and document ranking.  ...  In particular, PARADE can significantly improve results on collections with broad information needs where relevance signals can be spread throughout the document (such as TREC Robust04 and GOV2).  ...  QDS-Transformer tailors Longformer to the ranking task with query-directed sparse attention [38] .  ... 
arXiv:2008.09093v2 fatcat:yu4ipuk6sndyjew4j77nzo4wby

A Survey of Transformers [article]

Tianyang Lin, Yuxin Wang, Xiangyang Liu, Xipeng Qiu
2021 arXiv   pre-print
Finally, we outline some potential directions for future research.  ...  Up to the present, a great variety of Transformer variants (a.k.a.  ...  The improvements on attention mechanism can be divided into several directions: (1) Sparse Attention.  ... 
arXiv:2106.04554v2 fatcat:pjctgoqeffhq7ntyw52jqwfzsy

Sparse and Dense Approaches for the Full-rank Retrieval of Responses for Dialogues [article]

Gustavo Penha, Claudia Hauff
2022 arXiv   pre-print
Ranking responses for a given dialogue context is a popular benchmark in which the setup is to re-rank the ground-truth response over a limited set of n responses, where n is typically 10.  ...  We find the best performing method overall to be dense retrieval with intermediate training, i.e. a step after the language model pre-training where sentence representations are learned, followed by fine-tuning  ...  Sparse Retrieval In order to do sparse retrieval of responses we rely on classical retrieval methods with query and document expansion techniques.  ... 
arXiv:2204.10558v1 fatcat:jbab4qy6rrdetisxxa6cuapbi4

Designing An Information Framework For Semantic Search

İsmail Burak PARLAK
2022 European Journal of Science and Technology  
Semantic search methods were experimented then compared with lexical methods in data sets consisting of scientific documents.  ...  dealing with out-of-context data and semantic conflicts.  ...  Transformers are stacked with multiple layers. In each transformer layer, each vector are projected with 3 linear layer to Query, Key, Value.  ... 
doi:10.31590/ejosat.1043441 fatcat:vq2z7kg4ozd4rjobkvpqlubiby
« Previous Showing results 1 — 15 out of 10,299 results