789 Hits in 4.7 sec

Effects of OCR errors on ranking and feedback using the vector space model

Kazem Taghva, Julie Borsack, Allen Condit
1996 Information Processing & Management  
We report on the performance of the vector space model in the presence of OCR errors.  ...  We do see divergence though between the relevant document rankings of the OCR and corrected collections with different weighting combinations.  ...  We would also like to thank the anonymous referees for their thorough reading of this paper. Their comments and suggestions have greatly improved the quality of this work.  ... 
doi:10.1016/0306-4573(95)00058-5 fatcat:tqd2ghcrerf3zml7guqsjhp25y

A Statistical Approach to Automatic OCR Error Correction in Context

Xiang Tong, David A. Evans
1996 Workshop on Very Large Corpora  
This paper describes an automatic, context-sensitive, word-error correction system based on statistical language modeling (SLM) as applied to optical character recognition (OCR) postprocessing.  ...  Finally, the wordobigram model and Viterbi algorithm are used to determine the best scoring word sequence for the sentence.  ...  Acknowledgements We thank Nata~a Mili4-Frayling and an anonymous reviewer for their excellent comments on an earlier version of this paper.  ... 
dblp:conf/acl-vlc/TongE96 fatcat:njambt22qvdnvp3atehpqebwjy

Evaluation of model-based retrieval effectiveness with OCR text

Kazem Taghva, Julie Borsack, Allen Condit
1996 ACM Transactions on Information Systems  
We also demonstrate that the ranking and feedback methods associated with these models are generally not robust enough to deal with OCR errors.  ...  We give a comprehensive report on our experiments with retrieval from OCR-generated text using systems based on standard models of retrieval.  ...  During this period, we have had the privilege of discussing these projects with a number of our colleagues and improving them based on their input.  ... 
doi:10.1145/214174.214180 fatcat:k4wstpyeffg4nikyctppu33rou

Information access in the presence of OCR errors

Kazem Taghva, Thomas Nartker, Julie Borsack
2004 Proceedings of the 1st ACM workshop on Hardcopy document processing - HDP '04  
Over the last 15 years, the Information Science Research Institute (ISRI) at the University of Nevada, Las Vegas (UNLV) has conducted information access research in the presence of OCR errors.  ...  Our research has focused on issues associated with the construction of large document databases. In this paper, we will highlight our findings and detail our current activities.  ...  For example, we found in the vector space model that cosine normalization negatively affects document ranking.  ... 
doi:10.1145/1031442.1031443 fatcat:3zoiskjkibcq3mcfb7bsv5qrua

Scalable ranked retrieval using document images

Rajiv Jain, Douglas W. Oard, David Doermann, Bertrand Coüasnon, Eric K. Ringger
2013 Document Recognition and Retrieval XXI  
To minimize the storage cost and computational requirements of this matching, the SURF feature vector is reduced to 8 dimensions using PCA.  ...  The best method to perform ranked retrieval on a large corpus of document images, however, remains an open research question.  ...  Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.  ... 
doi:10.1117/12.2038656 dblp:conf/drr/JainOD14 fatcat:fakph7vukvglnanvqfjadbx42a

Retrieving poorly degraded OCR documents

Y. Fataicha, M. Cheriet, J. Y. Nie, C. Y. Suen
2005 International Journal on Document Analysis and Recognition  
query, based on a vector space IR model.  ...  This paper uses an automatic approach to examine the selection and the effectiveness of searching techniques for possible erroneous terms for query expansion.  ...  They showed the effects of OCR errors on ranking and feedback using the vector space model.  ... 
doi:10.1007/s10032-005-0147-6 fatcat:sh7rpswuqjfmdaptnbggyxtlxq

TRECVID 2004 Search and Feature Extraction Task by NUS PRIS

Tat-Seng Chua, Shi-Yong Neo, Keya Li, Gang Wang, Rui Shi, Ming Zhao, Huaxin Xu
2004 TREC Video Retrieval Evaluation  
From these categories, we induce a number of constraints on the search process, including: (a) the type of multi-modality features to use or emphasize; (b) the key concept terms in text query to use; and  ...  The results on 60 hours of test video from TRECVID 2004 evaluation demonstrate that our approaches are effective.  ...  Acknowledgments The authors would like to thanks Institute for Infocomm Research (I2R) for the support of the research project "Intelligent Media and Information Processing" (R-252-000-157-593), under  ... 
dblp:conf/trecvid/ChuaNLWS0X04 fatcat:6svu7lpqqvgz5ff7d65mgjmqqm

The Effects of Positive and Negative Online Customer Reviews: Do Brand Strength and Category Maturity Matter?

Nga N. Ho-Dac, Stephen J. Carson, William L. Moore
2013 Journal of Marketing  
This creates a positive feedback loop between sales and positive OCRs for models of weak brands that not only helps their sales but also increases overall brand equity, benefiting all models of the brand  ...  In contrast, OCRs have no significant impact on the sales of the models of strong brands, although these models do receive a significant sales boost from their greater brand equity.  ...  After we have determined the strong brands, we use the following model-level equation to estimate the main and inter action effects of OCRs and brand equity on model sales rank: where, in addition to the  ... 
doi:10.1509/jm.11.0011 fatcat:rsgr6myodrazzfnujeokuqlgza

Zero-shot video retrieval using content and concepts

Jeffrey Dalton, James Allan, Pranav Mirajkar
2013 Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13  
Recent research in video retrieval has been successful at finding videos when the query consists of tens or hundreds of sample relevant videos for training supervised models.  ...  With relevance feedback, our approach provides additional improvements of over 50%.  ...  [18] use the vector space model to match text queries to concept descriptions which are used for identifying relevant videos. Neo et al.  ... 
doi:10.1145/2505515.2507880 dblp:conf/cikm/DaltonAM13 fatcat:zgekksitfrfblkrv7ldn5w4tba

OCR binarization and image pre-processing for searching historical documents

Maya R. Gupta, Nathaniel P. Jacobson, Eric K. Garcia
2007 Pattern Recognition  
The OCR in the ABBYY FineReader 7.1 SDK is used as a black box metric to compare methods.  ...  Results for 12 pages from six newspapers of differing quality show that performance varies widely by image, but that the classic Otsu method and Otsu-based methods perform best on average.  ...  Error diffusion is usually implemented as a raster scan process, so one considers an image to be a long vector where the rows are concatenated and read into the vector from left to right.  ... 
doi:10.1016/j.patcog.2006.04.043 fatcat:gqax7ani5vda3nmdpyhmfbpsfe

Probabilistic Approaches to Video Retrieval

Tzvetanka I. Ianeva, Liudmila V. Boldareva, Thijs Westerveld, Roberto Cornacchia, Djoerd Hiemstra, Arjen P. de Vries
2004 TREC Video Retrieval Evaluation  
Our experiments for TRECVID 2004 further investigate the applicability of the so-called "Generative Probabilistic Models to video retrieval".  ...  TRECVID 2003 results demonstrated that mixture models com-* supported by Valencian government grants GV CTBPRR/2002/21 and CTESPRR/2004/003 LL-I-comb-TV combination at search-time: visualbased (same as  ...  This emphasizes usefulness of adding OCR information, because for instance an interactive retrieval system relies on relevance feedback on few shots from the top, that are useful to the user.  ... 
dblp:conf/trecvid/IanevaBWCHV04 fatcat:2cxkit6nhja73lkd5uwth3bkh4

Unconstrained handwritten document retrieval

Huaigu Cao, Venu Govindaraju, Anurag Bhardwaj
2010 International Journal on Document Analysis and Recognition  
The second method uses the uncorrected or raw OCR'ed text but modifies the standard vector space model for handling noisy text issues.  ...  The first method uses a novel bootstrapping mechanism to refine the OCR'ed text and uses the cleaned text for retrieval.  ...  Classic vector model In the classic Vector Model [1] , the documents are represented by the vector space of terms. A term is a word from the vocabulary of all of the documents.  ... 
doi:10.1007/s10032-010-0139-z fatcat:j5emrkltifcgxin3xiybxgsbwq

Online Similarity Learning with Feedback for Invoice Line Item Matching [article]

Chandresh Kumar Maurya, Neelamadhav Gantayat, Sampath Dechu, Tomas Horvath
2020 arXiv   pre-print
If the agent's feedback is in the form of a relative ranking between descriptions, we use similarity ranking algorithm.  ...  We showcase the comparative effectiveness and efficiency of the proposed approaches over many benchmarks and real-world data sets.  ...  ED 18-1-2019-0030 (Application-specific highly reliable IT solutions) has been implemented with the support provided from the National Research, Development and Innovation Fund of Hungary, financed under  ... 
arXiv:2001.00288v2 fatcat:cbhjjk2sevbtraxbyeuo4jl6tu

Survey of Post-OCR Processing Approaches

Thi-Tuyet-Hai Nguyen, Adam Jatowt, MIickael Coustaty, Antoine Doucet
2021 Zenodo  
Optical character recognition (OCR) is one of the most popular techniques used for converting printed documents into machine-readable ones.  ...  We then define the post-OCR processing problem, illustrate its typical pipeline, and review the state-of-the-art post-OCR processing approaches.  ...  This type of error accounts for around 18% of OCR errors [114] and happens when OCR software incorrectly recognizes spaces in documents.  ... 
doi:10.5281/zenodo.4635569 fatcat:x5qoluap7rgyxakv5lm5qcysya

Evaluation of Query Formulations in the Negotiated Query Refinement Process of Legal e-Discovery: UMKC at TREC 2007 Legal Track

Feng C. Zhao, Yugyung Lee, Deep Medhi
2007 Text Retrieval Conference  
For our study, we considered three sets of paired runs in vector space model and language model respectively.  ...  This result provided us an insight into the query negotiation process and a new direction to refine queries.  ...  A customized stop words list of 1,236 items was used to reduce the index size and to clean the OCR error.  ... 
dblp:conf/trec/ZhaoLM07 fatcat:6n5ni4wefrdwhlqz3fvgkqqz5i
« Previous Showing results 1 — 15 out of 789 results