Filters








5,363 Hits in 3.1 sec

Evaluation of model-based retrieval effectiveness with OCR text

Kazem Taghva, Julie Borsack, Allen Condit
1996 ACM Transactions on Information Systems  
We give a comprehensive report on our experiments with retrieval from OCR-generated text using systems based on standard models of retrieval.  ...  We also demonstrate that the ranking and feedback methods associated with these models are generally not robust enough to deal with OCR errors.  ...  During this period, we have had the privilege of discussing these projects with a number of our colleagues and improving them based on their input.  ... 
doi:10.1145/214174.214180 fatcat:k4wstpyeffg4nikyctppu33rou

Maryland at FIRE 2011: Retrieval of OCR'd Bengali [chapter]

Utpal Garain, David S. Doermann, Douglas W. Oard
2013 Lecture Notes in Computer Science  
In this year's Forum for Information Retrieval Evaluation (FIRE), the University of Maryland participated in the Retrieval of Indic Script OCRed Text (RISOT) task to experiment with the retrieval of Bengali  ...  The experiments focused on evaluating a retrieval strategy motivated by recent work on Cross-Language Information Retrieval (CLIR), but which makes use of OCR error modeling rather than parallel text alignment  ...  Acknowledgment: Help from Jiaul Paik of CVPRU, Indian Statistical Institute, Kolkata, India for implementing some parts of the present experiment is sincerely acknowledged.  ... 
doi:10.1007/978-3-642-40087-2_20 fatcat:goisgrjz5rfbvg4e67n7lmxqvu

A Word & Character N-Gram based Arabic OCR Error Simulation model

Mostafa Ezzat, Tarek Ahmed ElGhazaly, Mervat Gheith
2013 INTERNATIONAL JOURNAL OF COMPUTERS & TECHNOLOGY  
This paper provides a new model aimed to enhanceArabic OCR degraded text retrieval effectiveness.  ...  The retrieval effectiveness of the newmodel is %93, while the best effectiveness published for word based approach was %84 and the best effectiveness for character based approach was %56.  ...  and a set of experiments that were designed to identify the effect of the proposed model on retrieval effectiveness.  ... 
doi:10.24297/ijct.v12i8.2999 fatcat:b3fk3peepfhe5fjuaxfldn25hy

The effects of noisy data on text retrieval

Kazem Taghva, Julie Borsack, Allen Condit, Srinivas Erva
1994 Journal of the American Society for Information Science  
In particular, an OCR generated database and its corresponding 99.8% correct version are used to process a set of queries to determine the effect the degraded version will have on retrieval.  ...  It is shown that with the set of scientific documents we use in our testing, the effect is insignificant.  ...  For an OCR device, the input page can be used as a measure of the effectiveness of the method used to produce the output; however, this same kind of evaluation does not transfer to text retrieval.  ... 
doi:10.1002/(sici)1097-4571(199401)45:1<50::aid-asi6>3.0.co;2-b fatcat:qtfxbrvjpvduleebuge4lk2kaq

Overview of the FIRE 2011 RISOT Task [chapter]

Utpal Garain, Jiaul H. Paik, Tamaltaru Pal, Prasenjit Majumder, David S. Doermann, Douglas W. Oard
2013 Lecture Notes in Computer Science  
RISOT was a pilot task in FIRE 2011 which focused on the retrieval of automatically recognized text from machine printed sources.  ...  Two teams participated, submitting a total of 11 monolingual runs.  ...  The track has three primary goals: (1) supporting experimentation of retrieval from printed documents, (2) evaluating IR effectiveness for retrieval based on Indic script OCR, and (3) providing a venue  ... 
doi:10.1007/978-3-642-40087-2_19 fatcat:ju2c5ojivndx7f6v367md7g4iu

Video retrieval using speech and image information

Alexander G. Hauptmann, Rong Jin, Tobun D. Ng, Minerva M. Yeung, Rainer W. Lienhart, Chung-Sheng Li
2003 Storage and Retrieval for Media Databases 2003  
This paper provides an evaluation on the effects of different types of information used for video retrieval from a video collection.  ...  For the queries used in this evaluation, image matching and video OCR proved to be the deciding aspects of video information retrieval.  ...  While most of the evaluations were concerned with text retrieval, there have also been evaluations of document collections with OCR errors and spoken document collections that include speech recognition  ... 
doi:10.1117/12.479747 dblp:conf/spieSR/HauptmannJN03 fatcat:tusllilhvvdifmoevwextb3bke

Zero-shot video retrieval using content and concepts

Jeffrey Dalton, James Allan, Pranav Mirajkar
2013 Proceedings of the 22nd ACM international conference on Conference on information & knowledge management - CIKM '13  
We find that concept-based retrieval significantly outperforms text based approaches in recall.  ...  Recent research in video retrieval has been successful at finding videos when the query consists of tens or hundreds of sample relevant videos for training supervised models.  ...  OCR/ASR with the recall of concept-based retrieval.  ... 
doi:10.1145/2505515.2507880 dblp:conf/cikm/DaltonAM13 fatcat:zgekksitfrfblkrv7ldn5w4tba

OCR quality affects perceived usefulness of historical newspaper clippings – a user study [article]

Kimmo Kettunen, Heikki Keskustalo, Sanna Kumpulainen, Tuula Pääkkönen, Juha Rautiainen
2022 arXiv   pre-print
Such studies have either focused on the effects of artificially degraded OCR quality (see, e.g., [1-2]) or utilized test collections containing texts based on authentic low quality OCR data (see, e.g.,  ...  Effects of Optical Character Recognition (OCR) quality on historical information retrieval have so far been studied in data-oriented scenarios regarding the effectiveness of retrieval results.  ...  Faculty of Information Technology and Communication Sciences of the Tampere University took part in the arrangement of the query sessions and evaluation of the results as part of the Project EVOLUZ (#326616  ... 
arXiv:2203.03557v1 fatcat:wdiqq5zocvcctc2gzzapmengvi

Revisiting N-Gram Based Models for Retrieval in Degraded Large Collections [chapter]

Javier Parapar, Ana Freire, Álvaro Barreiro
2009 Lecture Notes in Computer Science  
The traditional retrieval models based on term matching are not effective in collections of degraded documents (output of OCR or ASR systems for instance).  ...  This paper presents a n-gram based distributed model for retrieval on degraded text large collections.  ...  Conclusions and Future Work The work here presented tries to minimise the effect of text degradation in the traditional term based retrieval models.  ... 
doi:10.1007/978-3-642-00958-7_66 fatcat:lvd5zhlx6zfhdfeulqituzaxjm

Multi-modal information retrieval from broadcast video using OCR and speech recognition

Alexander G. Hauptmann, Rong Jin, Tobun Dorbin Ng
2002 Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries - JCDL '02  
OCR and speech recognition are compared on the 2001 TREC Video Retrieval evaluation corpus. Results show that OCR is more important that speech recognition for video retrieval.  ...  OCR retrieval can further improve through dictionary-based post-processing. We demonstrate how to utilize imperfect multi-modal metadata results to benefit multi-modal information retrieval.  ...  Interestingly enough, combining the n-gram post-processed OCR with the speech transcripts (ARR of 5.11% and recall of 16.07%) did not improve the retrieval effectiveness.  ... 
doi:10.1145/544220.544252 dblp:conf/jcdl/HauptmannJN02 fatcat:3z5iw5e2gvadxm5xpzeoebjn3i

Multi-modal information retrieval from broadcast video using OCR and speech recognition

Alexander G. Hauptmann, Rong Jin, Tobun Dorbin Ng
2002 Proceedings of the second ACM/IEEE-CS joint conference on Digital libraries - JCDL '02  
OCR and speech recognition are compared on the 2001 TREC Video Retrieval evaluation corpus. Results show that OCR is more important that speech recognition for video retrieval.  ...  OCR retrieval can further improve through dictionary-based post-processing. We demonstrate how to utilize imperfect multi-modal metadata results to benefit multi-modal information retrieval.  ...  Interestingly enough, combining the n-gram post-processed OCR with the speech transcripts (ARR of 5.11% and recall of 16.07%) did not improve the retrieval effectiveness.  ... 
doi:10.1145/544247.544252 fatcat:uj2dshjeifgqvn7fwdaufnkfx4

Chinese document image retrieval based on recognition candidates

Xuhui Jia, Yong Xia, Rui Zhou, Hongwei Liang
2012 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery  
For the sake of the low recognition rate for degraded Chinese document, the retrieval performance is not good if directly based on OCR result.  ...  In this paper, an indexing method with n-gram and recognition candidates is proposed to improve the performance of retrieval.  ...  a series of studies to identify the effects of OCR errors on text retrieval using different weighting schemes.  ... 
doi:10.1109/fskd.2012.6233763 dblp:conf/fskd/JiaX0L12 fatcat:z63tht3tnrdlzpeja2qhdt5tne

Efficient Media Retrieval from Non-Cooperative Queries [article]

Kevin Shih, Wei Di, Vignesh Jagadeesh, Robinson Piramuthu
2014 arXiv   pre-print
accuracy over that of either VLAD or the text alone.  ...  Finally, we demonstrate how to use this text-matching as a feature in conjunction with popular retrieval features such as VLAD using a simple learning setup to achieve significant improvements in retrieval  ...  However, our primary goal was to demonstrate the effectiveness of text-based information in realistic retrieval settings.  ... 
arXiv:1411.5307v1 fatcat:bix3ktvt7fay5hap3q5olxn4h4

Assessing the Impact of OCR Quality on Downstream NLP Tasks

Daniel van Strien, Kaspar Beelen, Mariona Ardanuy, Kasra Hosseini, Barbara McGillivray, Giovanni Colavizza
2020 Proceedings of the 12th International Conference on Agents and Artificial Intelligence  
Based on these results, we offer some preliminary guidelines for working with text produced through OCR.  ...  Scholars and libraries are increasingly using OCR-generated text for retrieval and analysis. However, the process of creating text through OCR introduces varying degrees of error to the text.  ...  ACKNOWLEDGEMENTS Work for this paper was produced as part of Living with Machines.  ... 
doi:10.5220/0009169004840496 dblp:conf/icaart/StrienBAHMC20 fatcat:bgops4vhlngwzipjwcox2ls4wq

Reusing the Model and Components of an IIR Study for Perceived Effects of OCR Quality Change

Kimmo Kettunen, Heikki Keskustalo, Birger Larsen, Tuula Pääkkönen, Juha Rautiainen
2022 Zenodo  
The effect of OCR noise on IR performance has been studied earlier by utilizing artificially degraded OCR quality texts (see, e.g., [2, 15]), test collection containing documents with authentic low OCR  ...  However, the research design and its general model could be utilized in the future to study the effects of OCR quality on professional settings entailing historians performing naturalistic phases of their  ...  Subsequently we could evaluate the effect of OCR quality changes on the subjective perceptions based on the relevance evaluations expressed by the test subjects.  ... 
doi:10.5281/zenodo.6513586 fatcat:cg673qdtdbgsnlfav5kw6btt3m
« Previous Showing results 1 — 15 out of 5,363 results