A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2010; you can also visit the original URL.
The file type is application/pdf
.
Filters
Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections
[chapter]
2009
Lecture Notes in Computer Science
This paper presents a n-gram based distributed model for retrieval on degraded text large collections. ...
The traditional retrieval models based on term matching are not effective in collections of degraded documents (output of OCR or ASR systems for instance). ...
We compared the presented approach, inspired in previous n-gram based retrieval methods, against a traditional term based vector space model. ...
doi:10.1007/978-3-642-00958-7_66
fatcat:lvd5zhlx6zfhdfeulqituzaxjm
Combining compound and single terms under language model framework
2013
Knowledge and Information Systems
Particularly in language modeling approaches this extension is achieved through the use of the bigram or n-gram models. ...
Most existing Information Retrieval model including probabilistic and vector space models are based on the term independence hypothesis. ...
Acknowledgments We thank the editor and anonymous reviewers for their very useful comments and suggestions. ...
doi:10.1007/s10115-013-0618-x
fatcat:wl5aoew5fnhsdfxoq4mvtx3ijy
Report on Thomson Legal and Regulatory Experiments at CLEF-2004
[chapter]
2005
Lecture Notes in Computer Science
While the fertility-based approach picks good terms, it does not help improve bilingual retrieval. Pseudo-relevance feedback, on the other hand, resulted in improved average precision. ...
We compared two translation models for query translation, and captured compound translations through fertility probabilities. ...
Our stopword experiments confirmed well-established results about stopword removal and retrieval effectiveness. ...
doi:10.1007/11519645_11
fatcat:mibbn5vgnfaflpdrlka5bdfwia
Graph-based concept weighting for medical information retrieval
2012
Proceedings of the Seventeenth Australasian Document Computing Symposium on - ADCS '12
This paper presents a graph-based method to weight medical concepts in documents for the purposes of information retrieval. ...
Medical concepts are extracted from free-text documents using a state-of-the-art technique that maps n-grams to concepts from the SNOMED CT medical ontology. ...
In contrast, the term-based graph does not encode these n-grams: instead, the two terms are split as separate vertices, both receiving a lower weight. ...
doi:10.1145/2407085.2407096
dblp:conf/adcs/KoopmanZBSL12
fatcat:dt5nvii5g5aw7mxn2c3tvef224
Adaptive Representations for Tracking Breaking News on Twitter
[article]
2014
arXiv
pre-print
Therefore, there is considerable interest in developing filters for tweet streams in order to track and summarize stories. ...
Assessments based on ROUGE metrics indicate that an adaptive approaches are best suited for tracking evolving stories on Twitter. ...
We thank Storyful for providing access to data, and early adopters of custom timelines who unknowingly contributed ground truth used in the evaluation. ...
arXiv:1403.2923v3
fatcat:55zfwx53kvhe3ou6xrdaiesvqi
Neural Networks Revisited for Proper Name Retrieval from Diachronic Documents
[chapter]
2018
Lecture Notes in Computer Science
Developing high-quality transcription systems for very large vocabulary corpora is a challenging task. Proper names are usually key to understanding the information contained in a document. ...
In this paper, we extend the previously proposed neural networks for word embedding models: word vector representation proposed by Mikolov is enriched by an additional non-linear transformation. ...
In order to incorporate the new PNs in the language model, we re-estimated it for each augmented vocabulary using the large text corpus described in Section 3.3. ...
doi:10.1007/978-3-319-93782-3_2
fatcat:32ec46qwyzbjznrmhtvrxwqzry
Duplicate-Search-Based Image Annotation Using Web-Scale Data
2012
Proceedings of the IEEE
Annotation of images on the Web, based on label propagation over similar images and social information, is discussed in this paper; a system called Arista is used to demonstrate scalability. ...
ABSTRACT | Easy photo-taking and photo-sharing today make image an increasingly important type of media in people's everyday life, which arouses a growing demand for a practical Manuscript ...
Zhang for their work on the Arista system. ...
doi:10.1109/jproc.2012.2193109
fatcat:fox4i4n53bhp3kwuekocwziua4
The data deluge: Challenges and opportunities of unlimited data in statistical signal processing
2009
2009 IEEE International Conference on Acoustics, Speech and Signal Processing
We then describe recent work in spoken language processing and image processing that has begun to address these challenges in order to tackle large-scale classification tasks. ...
When viewed through the lens of pattern classification, this content can be seen as a virtually unlimited supply of training data for various statistical modeling and labeling tasks such as speech recognition ...
Both of these used the web to estimate n-gram counts for a LM for the news domain. ...
doi:10.1109/icassp.2009.4960430
dblp:conf/icassp/SeltzerZ09
fatcat:ifxs3ob52zfyjinrhlor3s7qya
Towards Debiasing Fact Verification Models
2019
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
In this paper, we investigate the cause of this phenomenon, identifying strong cues for predicting labels solely based on the claim, without considering any evidence. ...
This work is a step towards a more sound evaluation of reasoning capabilities in fact verification models. 1 ...
Acknowledgments We thank the MIT NLP group and the reviewers for their helpful discussion and comments. This work is supported by DSO grant DSOCL18002. ...
doi:10.18653/v1/d19-1341
dblp:conf/emnlp/SchusterSYFSB19
fatcat:m3y2lys4y5ebtdxrdgazrmjsxe
Remedies against the Vocabulary Gap in Information Retrieval
[article]
2017
arXiv
pre-print
More specifically, we propose (1) methods to formulate an effective query from complex textual structures and (2) latent vector space models that circumvent the vocabulary gap in information retrieval. ...
While term-based approaches are intuitive and effective in practice, they are based on the hypothesis that documents that exactly contain the query terms are highly relevant regardless of query semantics ...
This is infeasible during model training when the collection of retrievable objects becomes too large, as is the case for product search. ...
arXiv:1711.06004v1
fatcat:6vkhvfby3zbzrepgopunm7gie4
Efficient Cost-Aware Cascade Ranking in Multi-Stage Retrieval
2017
Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '17
Complex machine learning models are now an integral part of modern, large-scale retrieval systems. ...
However, collection size growth continues to outpace advances in eciency improvements in the learning models which achieve the highest eectiveness. ...
Table 5 shows the complete feature breakdown based on the two main categories of features used. e rst set of features are a large collection of pre-retrieval features commonly used for predicting query ...
doi:10.1145/3077136.3080819
dblp:conf/sigir/ChenGBC17
fatcat:saelzzrjpjb2jk6zoqpeoxpf5m
Translation techniques in cross-language information retrieval
2012
ACM Computing Surveys
Translation is therefore a pivotal activity for CLIR engines. Over the last 15 years, the CLIR community has developed a wide range of techniques and models supporting free text translation. ...
Cross-language information retrieval (CLIR) is an active sub-domain of information retrieval (IR). ...
ACKNOWLEDGMENTS This research was partially supported by a PHD scholarship from the University of Nottingham and funding from the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for ...
doi:10.1145/2379776.2379777
fatcat:mu5p5djufjghvn3xjppekmwnwu
Deep Reinforced Query Reformulation for Information Retrieval
[article]
2020
arXiv
pre-print
In addition, to evaluate the quality of the reformulated query in the context of information retrieval, we first train our DRQR model, then apply the retrieval ranking model on the obtained reformulated ...
Query reformulations have long been a key mechanism to alleviate the vocabulary-mismatch problem in information retrieval, for example by expanding the queries with related query terms or by generating ...
Higher IDF values indicate that a term is infrequent and helps to guide the retrieval process. id f (t) = log N N t (14) where N is the number of documents in the whole collection and N t is the number ...
arXiv:2007.07987v1
fatcat:gksqmihufvgcfhao4pjumyjphi
AirLoop: Lifelong Loop Closure Detection
[article]
2022
arXiv
pre-print
It is therefore desirable to incorporate the data newly collected during operation for incremental learning. ...
In this paper, we present AirLoop, a method that leverages techniques from lifelong learning to minimize forgetting when training loop closure detection models incrementally. ...
In deep learning-based methods, a CNN is trained to generate vector-valued global descriptors for each image, and matching images are retrieved based on descriptor similarities. ...
arXiv:2109.08975v3
fatcat:7iwyf7mpmnaltmk4gluwekvjie
Image Retrieval Based on Anisotropic Scaling and Shearing Invariant Geometric Coherence
2014
2014 22nd International Conference on Pattern Recognition
Imposing a spatial coherence constraint on image matching is becoming a necessity for local feature based object retrieval. ...
Extensive experimentation and comparisons with state-of-the-art spatial coherence models demonstrate the superiority of our approach in image retrieval tasks. ...
INTRODUCTION The Bag-Of-Words (BOW) model based on local features has been shown to be successful in object retrieval. ...
doi:10.1109/icpr.2014.677
dblp:conf/icpr/WuK14
fatcat:ql4svvonzbctllalukfp453yxm
« Previous
Showing results 1 — 15 out of 1,706 results