22 Hits in 2.4 sec

PARADE: Passage Representation Aggregation for Document Reranking [article]

Canjia Li, Andrew Yates, Sean MacAvaney, Ben He, Yingfei Sun
2021 arXiv   pre-print
We find that passage representation aggregation techniques can significantly improve over techniques proposed in prior work, such as taking the maximum passage score.  ...  In this work, we explore strategies for aggregating relevance signals from a document's passages into a final ranking score.  ...  METHOD In this section, we formalize approaches for aggregating passage representations into document ranking scores.  ... 
arXiv:2008.09093v2 fatcat:yu4ipuk6sndyjew4j77nzo4wby

MPII at the TREC 2020 Deep Learning Track

Canjia Li, Andrew Yates
2020 Text Retrieval Conference  
PARADE is based on the idea that aggregating passage-level relevance representations is preferable to aggregating relevance scores.  ...  The results differ from both those in the PARADE paper and those from the NTCIR-15 WWW-3 track: on this document ranking task, the least complex representation aggregation technique performs best.  ...  aggregation of passage representations).  ... 
dblp:conf/trec/LiY20 fatcat:fcsjhavsx5gptl75w5qhcjhvye

Pretrained Transformers for Text Ranking: BERT and Beyond [article]

Jimmy Lin, Rodrigo Nogueira, Andrew Yates
2021 arXiv   pre-print
There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness  ...  We cover a wide range of modern techniques, grouped into two high-level categories: transformer models that perform reranking in multi-stage architectures and dense retrieval techniques that perform ranking  ...  Special thanks goes out to two anonymous reviewers for their insightful comments and helpful feedback.  ... 
arXiv:2010.06467v3 fatcat:obla6reejzemvlqhvgvj77fgoy

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval [article]

Minghan Li, Diana Nicoleta Popa, Johan Chagnon, Yagmur Gizem Cinar, Eric Gaussier
2021 arXiv   pre-print
We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few blocks are aggregated to form a short document that  ...  Recent works dealing with this issue include truncating long documents, in which case one loses potential relevant information, segmenting them into several passages, which may lead to miss some information  ...  A hierarchical layer, in the form of a max-pooling, attention, CNN or transformer aggregator is used to aggregate the passage representations so as to obtain a joint query-document representation for long  ... 
arXiv:2111.09852v2 fatcat:iu6tm4xchzcufg43rx5lykawia

BERT-based Dense Intra-ranking and Contextualized Late Interaction via Multi-task Learning for Long Document Retrieval

Minghan Li, Eric Gaussier
2022 Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval  
However, despite its success in passage retrieval, it's not straightforward to use this approach for long document retrieval.  ...  Fast intra-ranking by dot product is used to select relevant passages, then fine-grained interaction of pre-stored token embeddings is used to generate passage scores which are aggregated to the final  ...  like TKL [10] , passage representation aggregation methods like PARADE [16] , and the recent selecting key block/passage for evaluation methods like KeyBLD and IDCM [9, 17, 18] .  ... 
doi:10.1145/3477495.3531856 fatcat:z3a7oo2lb5fdzayi4ixgftwwc4

Pretrained Transformers for Text Ranking: BERT and Beyond

Andrew Yates, Rodrigo Nogueira, Jimmy Lin
2021 Proceedings of the 14th ACM International Conference on Web Search and Data Mining  
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in response to a query for a particular task.  ...  We'd like to thank the following people for comments on earlier drafts of this work: Maura Grossman, Sebastian Hofstätter, Xueguang Ma, and Bhaskar Mitra.  ...  However, there remain many open research questions, and thus in addition to laying out the foundations of pretrained transformers for text ranking, this survey also attempts to prognosticate where the  ... 
doi:10.1145/3437963.3441667 fatcat:6teqmlndtrgfvk5mneq5l7ecvq

Information retrieval for label noise document ranking by bag sampling and group-wise loss [article]

Chunyu Li and Jiajia Ding and Xing hu and Fan Wang
2022 arXiv   pre-print
We use the head middle tail passage for the long document to encode the long document, and in the retrieval, stage Use dense retrieval to generate the candidate's data.  ...  However, there is still some crucial problem in long document ranking, such as data label noises, long document representations, negative data Unbalanced sampling, etc.  ...  level Document-level sorting based on paragraph representation aggregation PARADE [7] is a series of models that can divide the long text into multiple paragraphs and summarize each paragraph's [CLS  ... 
arXiv:2203.06408v1 fatcat:jc3k5s2dtneqjk2pjy6pt7e7fe

Intra-Document Cascading: Learning to Select Passages for Neural Document Ranking [article]

Sebastian Hofstätter, Bhaskar Mitra, Hamed Zamani, Nick Craswell, Allan Hanbury
2021 arXiv   pre-print
the document and then aggregating the outputs by pooling or additional Transformer layers.  ...  An emerging recipe for achieving state-of-the-art effectiveness in neural document re-ranking involves utilizing large pre-trained language models - e.g., BERT - to evaluate all individual passages in  ...  ACKNOWLEDGMENTS This work was supported in part by the Center for Intelligent Information Retrieval.  ... 
arXiv:2105.09816v1 fatcat:33o7fymcyzgrxidln2agcur2f4

Pre-training Methods in Information Retrieval [article]

Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo
2022 arXiv   pre-print
In addition, we also introduce PTMs specifically designed for IR, and summarize available datasets as well as benchmark leaderboards.  ...  Owing to sophisticated pre-training objectives and huge model size, pre-trained models can learn universal language representations from massive textual data, which are beneficial to the ranking task of  ...  representation aggregation methods like PARADE max .  ... 
arXiv:2111.13853v3 fatcat:pilemnpphrgv5ksaktvctqdi4y

An In-depth Analysis of Passage-Level Label Transfer for Contextual Document Ranking [article]

Koustav Rudra and Zeon Trevor Fernando and Avishek Anand
2021 arXiv   pre-print
We find that direct transfer of relevance labels from documents to passages introduces label noise that strongly affects retrieval effectiveness for large training datasets.  ...  Common approaches either truncate or split longer documents into small sentences/passages and subsequently label them - using the original document label or from another externally trained model.  ...  Recently, Li et al [32] proposed an end-to-end PARADE method to overcome the limitation of independent inference of passages and predict a document's relevance by aggregating passage representations.  ... 
arXiv:2103.16669v1 fatcat:isncaabddnb2lgbaeffkdvdpse

Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering [article]

Victor Zhong, Caiming Xiong, Nitish Shirish Keskar, Richard Socher
2019 arXiv   pre-print
across all of the documents with the query.  ...  In this work, we propose the Coarse-grain Fine-grain Coattention Network (CFC), a new question answering model that combines information from evidence across multiple documents.  ...  ACKNOWLEDGEMENT The authors thank Luke Zettlemoyer for his feedback and advice and Sewon Min for her help in preprocessing the TriviaQA dataset.  ... 
arXiv:1901.00603v2 fatcat:lrj7qkfmtrbazhixas4x4eigrm

IntenT5: Search Result Diversification using Causal Language Models [article]

Sean MacAvaney, Craig Macdonald, Roderick Murray-Smith, Iadh Ounis
2021 arXiv   pre-print
Existing approaches often rely on massive query logs and interaction data to generate a variety of possible query intents, which then can be used to re-rank documents.  ...  Our analysis shows that our approach is most effective for multi-faceted queries and is able to generalize effectively to queries that were unseen in training data.  ...  ACKNOWLEDGMENTS This work has been supported by EPSRC grant EP/R018634/1: Closed-Loop Data Science for Complex, Computationally-& Data-Intensive Analytics.  ... 
arXiv:2108.04026v1 fatcat:z3avwzgcovbbvjl54qjkemvmxy

A Self-supervised Joint Training Framework for Document Reranking

Xiaozhi Zhu, Tianyong Hao, Sijie Cheng, Fu Lee Wang, Hai Liu
2022 Findings of the Association for Computational Linguistics: NAACL 2022   unpublished
document reranking task.  ...  Pretrained language models such as BERT have been successfully applied to a wide range of natural language processing tasks and also achieved impressive performance in document reranking tasks.  ...  Acknowledgements We thank all anonymous reviewers for their insight ful comments to import this paper.  ... 
doi:10.18653/v1/2022.findings-naacl.79 fatcat:d7v52spqn5fsxfmizdrca3r2vu

Pretrained Transformers for Text Ranking: BERT and Beyond

Andrew Yates, Rodrigo Nogueira, Jimmy Lin
2021 Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Tutorials   unpublished
Cross-domain modeling of sentence-level evidence for document retrieval. In Amodei. 2020. Language models are few-shot learners. arXiv:2005.14165.  ...  We cover a wide range of techniques, grouped into two categories: transformer models that perform reranking in multi-stage ranking architectures and learned dense representations that perform ranking directly  ...  PARADE: Passage representation aggregation for document reranking. arXiv:2008.09093. Lin, Rodrigo Nogueira, and Andrew Yates. 2020a.  ... 
doi:10.18653/v1/2021.naacl-tutorials.1 fatcat:yv5njdhamvd7tbx75r5mwnofqe

Long Document Re-ranking with Modular Re-ranker [article]

Luyu Gao, Jamie Callan
2022 pre-print
Long document re-ranking has been a challenging problem for neural re-rankers based on deep language models like BERT. Early work breaks the documents into short passage-like chunks.  ...  We demonstrate that the model can use this new degree of freedom to aggregate important information from the entire document.  ...  The authors would like to thank Google's TPU Research Cloud (TRC) for access to Cloud TPUs and the anonymous reviewers for the reviews.  ... 
doi:10.1145/3477495.3531860 arXiv:2205.04275v1 fatcat:qy47s437bngr3jg37t46zdpdwa
« Previous Showing results 1 — 15 out of 22 results