25,878 Hits in 11.0 sec

The Power of Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval [article]

Minghan Li, Diana Nicoleta Popa, Johan Chagnon, Yagmur Gizem Cinar, Eric Gaussier
2021 arXiv   pre-print
We follow here a slightly different approach in which one first selects key blocks of a long document by local query-block pre-ranking, and then few blocks are aggregated to form a short document that  ...  Experiments conducted on standard Information Retrieval datasets demonstrate the effectiveness of the proposed approach.  ...  This paper is an extension of the earlier short paper published at SIGIR 2021 and entitled: "KeyBLD: Selecting Key Blocks with Local Pre-ranking for Long Document Information Retrieval" [32] .  ... 
arXiv:2111.09852v2 fatcat:iu6tm4xchzcufg43rx5lykawia

Multilingual Web retrieval: An experiment in English–Chinese business intelligence

Jialun Qin, Yilu Zhou, Michael Chau, Hsinchun Chen
2006 Journal of the American Society for Information Science and Technology  
Cross-language information retrieval (CLIR), the study of retrieving information in one language by queries expressed in another language, is a promising approach to the problem.  ...  As increasing numbers of non-English resources have become available on the Web, the interesting and important issue of how Web users can retrieve documents in different languages has arisen.  ...  We would also like to thank the AI Lab team members who developed the AI Lab SpidersRUs toolkit, the Mutual Information software, and the AZ Noun Phraser.  ... 
doi:10.1002/asi.20329 fatcat:kai3o7cidfdlvjcwkdkx2hwmzy

Privacy-Preserving Ranked Search on Public-Key Encrypted Data

Sahin Buyrukbilen, Spiridon Bakiras
2013 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing  
These protocols allow users to upload encrypted versions of their documents to the cloud, while retaining the ability to query the database with traditional plaintext keyword queries.  ...  As a result, owners of sensitive information may be skeptical in purchasing such services, given the risks associated with the unauthorized access to their data.  ...  ACKNOWLEDGMENTS This research has been funded by the NSF CAREER Award IIS-0845262.  ... 
doi:10.1109/hpcc.and.euc.2013.33 dblp:conf/hpcc/BuyrukbilenB13 fatcat:wp3v4fykvneunjg5ga2to6odru

Simple Local Attentions Remain Competitive for Long-Context Tasks [article]

Wenhan Xiong, Barlas Oğuz, Anchit Gupta, Xilun Chen, Diana Liskovich, Omer Levy, Wen-tau Yih, Yashar Mehdad
2022 arXiv   pre-print
able to build a simpler and more efficient long-doc QA model that matches the performance of Longformer with half of its pretraining compute.  ...  For each attention variant, we pretrain large-size models using the same long-doc corpus and then finetune these models for real-world long-context tasks.  ...  For the retrieval task, for the ease of experiments, we reported the mean reciprocal rank on the dev set 7 , which has been shown to correlate well with final retrieval metric like answer recall (Oguz  ... 
arXiv:2112.07210v2 fatcat:hyng2ppaezdrpafd5mv2mak5yi

Detecting Semantic Concepts from Video Using Temporal Gradients and Audio Classification [chapter]

Mika Rautiainen, Tapio Seppänen, Jani Penttilä, Johannes Peltola
2003 Lecture Notes in Computer Science  
Test runs and evaluations in TREC 2002 Video Track show consistent performance for Temporal Gradient Correlogram and state-of-the-art precision in audio-based instrumental sound detection. max © , max  ...  and max are maximum values for hue, saturation and value A hexagonal SOM with 30x27 nodes was initialized and trained with sample HSV Sector Histograms extracted from 10x10 image regions of 82 images containing  ...  We like to thank the National Technology Agency of Finland (Tekes) and the Academy of Finland for supporting this research.  ... 
doi:10.1007/3-540-45113-7_26 fatcat:eaiziqooardjbhmnzubfvfjkny

Modeling Relevance Ranking under the Pre-training and Fine-tuning Paradigm [article]

Lin Bo, Liang Pang, Gang Wang, Jun Xu, XiuQiang He, Ji-Rong Wen
2021 arXiv   pre-print
Recently, pre-trained language models such as BERT have been applied to document ranking for information retrieval, which first pre-train a general language model on an unlabeled large corpus and then  ...  Specifically, to model the user's view of relevance, Pre-Rank pre-trains the initial query-document representations based on large-scale user activities data such as the click log.  ...  INTRODUCTION Relevance ranking, whose objective is to provide the right ranking order of a list of documents for a given query [26, 27] , has played a vital role in the field of information retrieval  ... 
arXiv:2108.05652v1 fatcat:hiafpiym2jeqtdsanl52zfnrq4

Semantic Models for the First-stage Retrieval: A Comprehensive Review [article]

Yinqiong Cai, Yixing Fan, Jiafeng Guo, Fei Sun, Ruqing Zhang, Xueqi Cheng
2021 arXiv   pre-print
Unfortunately, these models suffer from the vocabulary mismatch problem, which may block re-ranking stages from relevant documents at the very beginning.  ...  Multi-stage ranking pipelines have been a practical solution in modern search systems, where the first-stage retrieval is to return a subset of candidate documents, and latter stages attempt to re-rank  ...  With the rise of more powerful pre-training neural networks (e.g., BERT, GPT-3), it is a natural way to combine them with the term-based models for improving the first-stage retrieval. Seo et al.  ... 
arXiv:2103.04831v3 fatcat:6qa7hvc3jve3pcmo2mo4qsiefq


Sendong Zhao, Chang Su, Andrea Sboner, Fei Wang
2019 Proceedings of the 28th ACM International Conference on Information and Knowledge Management - CIKM '19  
Learning to rank minimizes a ranking loss between biomedical articles with the query to learn the retrieval function.  ...  GRAPHENE consists of three main different modules 1) graph-augmented document representation learning; 2) query expansion and representation learning and 3) learning to rank biomedical articles.  ...  To make sure document representations can encode relevant local and non-local concepts, we design a decoder to see if key relevant information can be reproduced.  ... 
doi:10.1145/3357384.3358038 dblp:conf/cikm/ZhaoSSW19 fatcat:hpll7xork5em7jkoz4kz3m3gry

Pre-training Methods in Information Retrieval [article]

Yixing Fan, Xiaohui Xie, Yinqiong Cai, Jia Chen, Xinyu Ma, Xiangsheng Li, Ruqing Zhang, Jiafeng Guo
2022 arXiv   pre-print
The core of information retrieval (IR) is to identify relevant information from large-scale resources and return it as a ranked list to respond to the user's information need.  ...  In recent years, the resurgence of deep learning has greatly advanced this field and leads to a hot topic named NeuIR (i.e., neural information retrieval), especially the paradigm of pre-training methods  ...  Acknowledgements References Pre-training Methods in Information Retrieval Acknowledgements  ... 
arXiv:2111.13853v3 fatcat:pilemnpphrgv5ksaktvctqdi4y

Rethinking Search: Making Experts out of Dilettantes [article]

Donald Metzler, Yi Tay, Dara Bahri, Marc Najork
2021 arXiv   pre-print
This paper examines how ideas from classical information retrieval and large pre-trained language models can be synthesized and evolved into systems that truly deliver on the promise of expert advice.  ...  not have a true understanding of the world, they are prone to hallucinating, and crucially they are incapable of justifying their utterances by referring to supporting documents in the corpus they were  ...  Reading Comprehension with Numerical Reasoning. In Proceedings of the 2019 [50] Tie-Yan Liu. 2009. Learning to Rank for Information Retrieval.  ... 
arXiv:2105.02274v1 fatcat:qdghlnv2nnfhnoo6eafdaxqxzy

Automatic Generation of Descriptive Titles for Video Clips Using Deep Learning [article]

Soheyla Amirian, Khaled Rasheed, Thiab R. Taha, Hamid R. Arabnia
2021 arXiv   pre-print
together with text summarization; and finally, a title and an abstract are generated for the video.  ...  Over the last decade, the use of Deep Learning in many applications produced results that are comparable to and in some cases surpassing human expert performance.  ...  Acknowledgement We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan V GPU used for this research.  ... 
arXiv:2104.03337v1 fatcat:djot6uybhzel5csgw64jn6pw6a

FreePub: Collecting and Organizing Scientific Material Using Mindmaps [article]

Theodore Dalamagas, Tryfon Farmakakis, Manolis Maragkakis, Artemis Hatzigeorgiou
2010 arXiv   pre-print
All retrieved information from FreePub can be imported and organized in mindmaps. FreePub has been fully implemented on top of FreeMind, a popular open-source, mindmapping tool.  ...  Mindmaps are visual, graph-based represenations of concepts, ideas, notes, tasks, etc. They generally take a hierarchical or tree branch format, with ideas branching into their subsections.  ...  An abstract of the XML document that contains the information for one result is shown below: ...  ... 
arXiv:1012.1623v1 fatcat:4qlc5vjp3fdhnc3mytisbcwpde

Efficient Query Processing for Scalable Web Search

Nicola Tonellotto, Craig Macdonald, Iadh Ounis
2018 Foundations and Trends in Information Retrieval  
A key detail of the manner in which a search engine is designed to operate is the "topheavy nature" of results: since the users of search engines typically focus on the top-ranked results (as can be measured  ...  Search engines are exceptionally important tools for accessing information in today's world.  ...  Acknowledgements We would like to thank Maarten de Rijke for his patience and encouragements during the preparation of this manuscript, as well as the three anonymous reviewers for their constructive suggestions  ... 
doi:10.1561/1500000057 fatcat:wx53qhvfhnfwfc4hgdva5ypw3u

Text summarization using unsupervised deep learning

Mahmood Yousefi-Azar, Len Hamey
2017 Expert systems with applications  
The ENAE can make further improvements, particularly in selecting informative sentences.  ...  ENAE is a stochastic version of an AE that adds noise to the input text and selects the top sentences from an ensemble of noisy runs.  ...  Although in this randomly selected email, there is no intersection between the top ranked sentences of Ltf-NAE and Ltf-ENAE, both models do extract important information from this long email thread.  ... 
doi:10.1016/j.eswa.2016.10.017 fatcat:czp3lppc5jh4hmpj6xxos5fxa4

Query Driven Algorithm Selection in Early Stage Retrieval

Joel Mackenzie, J. Shane Culpepper, Roi Blanco, Matt Crane, Charles L. A. Clarke, Jimmy Lin
2018 Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining - WSDM '18  
Scalable web search systems typically employ multi-stage retrieval architectures, where an initial stage generates a set of candidate documents that are then pruned and re-ranked.  ...  As a proof of concept, we use the prediction framework to help alleviate the problem of tail-latency queries in early stage retrieval.  ...  When a pivot document is found (by summing the U t scores until the threshold is exceeded), the local block score is then used to re ne the estimated score, that is, the sum of the U b,t scores is computed  ... 
doi:10.1145/3159652.3159676 dblp:conf/wsdm/MackenzieCBCCL18 fatcat:x27r46ptwbh67py2ecsnzpiwsu
« Previous Showing results 1 — 15 out of 25,878 results