1,706 Hits in 4.8 sec

Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections [chapter]

Javier Parapar, Ana Freire, Álvaro Barreiro
2009 Lecture Notes in Computer Science  
This paper presents a n-gram based distributed model for retrieval on degraded text large collections.  ...  The traditional retrieval models based on term matching are not effective in collections of degraded documents (output of OCR or ASR systems for instance).  ...  We compared the presented approach, inspired in previous n-gram based retrieval methods, against a traditional term based vector space model.  ... 
doi:10.1007/978-3-642-00958-7_66 fatcat:lvd5zhlx6zfhdfeulqituzaxjm

Combining compound and single terms under language model framework

Arezki Hammache, Mohand Boughanem, Rachid Ahmed-Ouamer
2013 Knowledge and Information Systems  
Particularly in language modeling approaches this extension is achieved through the use of the bigram or n-gram models.  ...  Most existing Information Retrieval model including probabilistic and vector space models are based on the term independence hypothesis.  ...  Acknowledgments We thank the editor and anonymous reviewers for their very useful comments and suggestions.  ... 
doi:10.1007/s10115-013-0618-x fatcat:wl5aoew5fnhsdfxoq4mvtx3ijy

Report on Thomson Legal and Regulatory Experiments at CLEF-2004 [chapter]

Isabelle Moulinier, Ken Williams
2005 Lecture Notes in Computer Science  
While the fertility-based approach picks good terms, it does not help improve bilingual retrieval. Pseudo-relevance feedback, on the other hand, resulted in improved average precision.  ...  We compared two translation models for query translation, and captured compound translations through fertility probabilities.  ...  Our stopword experiments confirmed well-established results about stopword removal and retrieval effectiveness.  ... 
doi:10.1007/11519645_11 fatcat:mibbn5vgnfaflpdrlka5bdfwia

Graph-based concept weighting for medical information retrieval

Bevan Koopman, Guido Zuccon, Peter Bruza, Laurianne Sitbon, Michael Lawley
2012 Proceedings of the Seventeenth Australasian Document Computing Symposium on - ADCS '12  
This paper presents a graph-based method to weight medical concepts in documents for the purposes of information retrieval.  ...  Medical concepts are extracted from free-text documents using a state-of-the-art technique that maps n-grams to concepts from the SNOMED CT medical ontology.  ...  In contrast, the term-based graph does not encode these n-grams: instead, the two terms are split as separate vertices, both receiving a lower weight.  ... 
doi:10.1145/2407085.2407096 dblp:conf/adcs/KoopmanZBSL12 fatcat:dt5nvii5g5aw7mxn2c3tvef224

Adaptive Representations for Tracking Breaking News on Twitter [article]

Igor Brigadir, Derek Greene, Pádraig Cunningham
2014 arXiv   pre-print
Therefore, there is considerable interest in developing filters for tweet streams in order to track and summarize stories.  ...  Assessments based on ROUGE metrics indicate that an adaptive approaches are best suited for tracking evolving stories on Twitter.  ...  We thank Storyful for providing access to data, and early adopters of custom timelines who unknowingly contributed ground truth used in the evaluation.  ... 
arXiv:1403.2923v3 fatcat:55zfwx53kvhe3ou6xrdaiesvqi

Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents [chapter]

Irina Illina, Dominique Fohr
2018 Lecture Notes in Computer Science  
Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document.  ...  In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation.  ...  In order to incorporate the new PNs in the language model, we re-estimated it for each augmented vocabulary using the large text corpus described in Section 3.3.  ... 
doi:10.1007/978-3-319-93782-3_2 fatcat:32ec46qwyzbjznrmhtvrxwqzry

Duplicate-Search-Based Image Annotation Using Web-Scale Data

Xin-Jing Wang, Lei Zhang, Wei-Ying Ma
2012 Proceedings of the IEEE  
Annotation of images on the Web, based on label propagation over similar images and social information, is discussed in this paper; a system called Arista is used to demonstrate scalability.  ...  ABSTRACT | Easy photo-taking and photo-sharing today make image an increasingly important type of media in people's everyday life, which arouses a growing demand for a practical Manuscript  ...  Zhang for their work on the Arista system.  ... 
doi:10.1109/jproc.2012.2193109 fatcat:fox4i4n53bhp3kwuekocwziua4

The data deluge: Challenges and opportunities of unlimited data in statistical signal processing

Michael L. Seltzer, Lei Zhang
2009 2009 IEEE International Conference on Acoustics, Speech and Signal Processing  
We then describe recent work in spoken language processing and image processing that has begun to address these challenges in order to tackle large-scale classification tasks.  ...  When viewed through the lens of pattern classification, this content can be seen as a virtually unlimited supply of training data for various statistical modeling and labeling tasks such as speech recognition  ...  Both of these used the web to estimate n-gram counts for a LM for the news domain.  ... 
doi:10.1109/icassp.2009.4960430 dblp:conf/icassp/SeltzerZ09 fatcat:ifxs3ob52zfyjinrhlor3s7qya

Towards Debiasing Fact Verification Models

Tal Schuster, Darsh Shah, Yun Jie Serene Yeo, Daniel Roberto Filizzola Ortiz, Enrico Santus, Regina Barzilay
2019 Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)  
In this paper, we investigate the cause of this phenomenon, identifying strong cues for predicting labels solely based on the claim, without considering any evidence.  ...  This work is a step towards a more sound evaluation of reasoning capabilities in fact verification models. 1  ...  Acknowledgments We thank the MIT NLP group and the reviewers for their helpful discussion and comments. This work is supported by DSO grant DSOCL18002.  ... 
doi:10.18653/v1/d19-1341 dblp:conf/emnlp/SchusterSYFSB19 fatcat:m3y2lys4y5ebtdxrdgazrmjsxe

Remedies against the Vocabulary Gap in Information Retrieval [article]

Christophe Van Gysel
2017 arXiv   pre-print
More specifically, we propose (1) methods to formulate an effective query from complex textual structures and (2) latent vector space models that circumvent the vocabulary gap in information retrieval.  ...  While term-based approaches are intuitive and effective in practice, they are based on the hypothesis that documents that exactly contain the query terms are highly relevant regardless of query semantics  ...  This is infeasible during model training when the collection of retrievable objects becomes too large, as is the case for product search.  ... 
arXiv:1711.06004v1 fatcat:6vkhvfby3zbzrepgopunm7gie4

Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval

Ruey-Cheng Chen, Luke Gallagher, Roi Blanco, J. Shane Culpepper
2017 Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '17  
Complex machine learning models are now an integral part of modern, large-scale retrieval systems.  ...  However, collection size growth continues to outpace advances in eciency improvements in the learning models which achieve the highest eectiveness.  ...  Table 5 shows the complete feature breakdown based on the two main categories of features used. e rst set of features are a large collection of pre-retrieval features commonly used for predicting query  ... 
doi:10.1145/3077136.3080819 dblp:conf/sigir/ChenGBC17 fatcat:saelzzrjpjb2jk6zoqpeoxpf5m

Translation techniques in cross-language information retrieval

Dong Zhou, Mark Truran, Tim Brailsford, Vincent Wade, Helen Ashman
2012 ACM Computing Surveys  
Translation is therefore a pivotal activity for CLIR engines. Over the last 15 years, the CLIR community has developed a wide range of techniques and models supporting free text translation.  ...  Cross-language information retrieval (CLIR) is an active sub-domain of information retrieval (IR).  ...  ACKNOWLEDGMENTS This research was partially supported by a PHD scholarship from the University of Nottingham and funding from the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for  ... 
doi:10.1145/2379776.2379777 fatcat:mu5p5djufjghvn3xjppekmwnwu

Deep Reinforced Query Reformulation for Information Retrieval [article]

Xiao Wang, Craig Macdonald, Iadh Ounis
2020 arXiv   pre-print
In addition, to evaluate the quality of the reformulated query in the context of information retrieval, we first train our DRQR model, then apply the retrieval ranking model on the obtained reformulated  ...  Query reformulations have long been a key mechanism to alleviate the vocabulary-mismatch problem in information retrieval, for example by expanding the queries with related query terms or by generating  ...  Higher IDF values indicate that a term is infrequent and helps to guide the retrieval process. id f (t) = log N N t (14) where N is the number of documents in the whole collection and N t is the number  ... 
arXiv:2007.07987v1 fatcat:gksqmihufvgcfhao4pjumyjphi

AirLoop: Lifelong Loop Closure Detection [article]

Dasong Gao, Chen Wang, Sebastian Scherer
2022 arXiv   pre-print
It is therefore desirable to incorporate the data newly collected during operation for incremental learning.  ...  In this paper, we present AirLoop, a method that leverages techniques from lifelong learning to minimize forgetting when training loop closure detection models incrementally.  ...  In deep learning-based methods, a CNN is trained to generate vector-valued global descriptors for each image, and matching images are retrieved based on descriptor similarities.  ... 
arXiv:2109.08975v3 fatcat:7iwyf7mpmnaltmk4gluwekvjie

Image Retrieval Based on Anisotropic Scaling and Shearing Invariant Geometric Coherence

Xiaomeng Wu, Kunio Kashino
2014 2014 22nd International Conference on Pattern Recognition  
Imposing a spatial coherence constraint on image matching is becoming a necessity for local feature based object retrieval.  ...  Extensive experimentation and comparisons with state-of-the-art spatial coherence models demonstrate the superiority of our approach in image retrieval tasks.  ...  INTRODUCTION The Bag-Of-Words (BOW) model based on local features has been shown to be successful in object retrieval.  ... 
doi:10.1109/icpr.2014.677 dblp:conf/icpr/WuK14 fatcat:ql4svvonzbctllalukfp453yxm
« Previous Showing results 1 — 15 out of 1,706 results