113,076 Hits in 5.0 sec

Document Recognition for a Million Books

G. Sayeed Choudhury, Tim DiLauro, Robert Ferguson, Michael Droettboom, Ichiro Fujinaga
2006 D-Lib Magazine  
Rather, Gamera is mentioned in this context because it provides a useful benchmark for the types of document recognition capabilities that might be useful with a large corpus of digitized books.  ...  The presence of a large-scale book image corpus significantly raises the possibilities for these important document recognition capabilities, especially given the potential for statistical inferences or  ... 
doi:10.1045/march2006-choudhury fatcat:lzcde56ervbp7bsfggues4276a

A Generic Method for Automatic Ground Truth Generation of Camera-captured Documents [article]

Sheraz Ahmed, Muhammad Imran Malik, Muhammad Zeshan Afzal, Koichi Kise, Masakazu Iwamura, Andreas Dengel, Marcus Liwicki
2016 arXiv   pre-print
The third contribution is a novel method for the recognition of cameracaptured document images.  ...  The first contribution is a novel, generic method for automatic ground truth generation of camera-captured document images (books, magazines, articles, invoices, etc.).  ...  ACKNOWLEDGMENTS This work is supported in part by CREST and JSPS Grant-in-Aid for Scientific Research (A)(25240028).  ... 
arXiv:1605.01189v1 fatcat:ldykx3z7yrb63aloo2arqk5mvy

Single Interface For Music Score Searching And Analysis (Simssa)

Ichiro Fujinaga, Andrew Hankinson
2015 Zenodo  
Single Interface for Music Score Searching and Analysis (SIMSSA) project targets digitized music scores to de-sign a global infrastructure for searching and analyzing music scores.  ...  Specifically, we seek to provide research-ers, musicians, and others to access the contents and metadata of a large number of scores in a searchable, digital format.  ...  The Discovery sub-axis is developing a system that will automatically crawl millions of page images looking for digitized books with musical examples [8] .  ... 
doi:10.5281/zenodo.923821 fatcat:rpyvc6e26nepdouyet45ad5zfu

Digitizing a Million Books: Challenges for Document Analysis [chapter]

K. Pramod Sankar, Vamshi Ambati, Lakshmi Pratha, C. V. Jawahar
2006 Lecture Notes in Computer Science  
This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories.  ...  The challenges are identified from the experience of the on-going activities toward digitizing and archiving one million books.  ...  Raj Reddy, CMU for his valuable guidance of this project and also for his suggestions towards this paper. We thank Prof. N. Balakrishnan of IISC-Bangalore and Prof.  ... 
doi:10.1007/11669487_38 fatcat:up52vbsmizh7voy6g5lp56rzfu

Camlens – An Innovative Android Phone Application To Empower The Blind And Visually Impaired In Reading Any Kind Of Printed Text In Real-Time Using Opencv, Optical Character Recognition And Text-To-Speech

Mr. Sahil Sachdeva, Ms. Akshita Sachdeva, Mr. Yash Bakshi, Prof. Manoj Kumar
2018 Zenodo  
by the blind and visually impaired person the usage of earphones.The genesis of the research comes from the fact that the three edges of a page of the book are easier to find with lesser possibilities  ...  the perspective transformation of the cropped photographto obtain an image containing the scanned document.  ...  INTRODUCTION 285 million people are estimated to be visually impaired worldwide: 39 million are blind and 246 have low vision and 90% of these live in low-income settings.  ... 
doi:10.5281/zenodo.1451741 fatcat:g4yvu66adbbvtmdfyica46refa

Digital Document Image Retrieval Using Optical Music Recognition

Andrew Hankinson, John Ashley Burgoyne, Gabriel Vigliensoni, Alastair Porter, Jessica Thompson 0001, Wendy Liu, Remi Chiu, Ichiro Fujinaga
2012 Zenodo  
ACKNOWLEDGEMENTS This work would not have been possible without the efforts of a number of people.  ...  Further funding was provided by the Centre for Interdisciplinary Research in Music Media and Technology and the Canadian Foundation for Innovation.  ...  This means that for their required goal of 10 million books, their expected index size is two terabytes of which most of the information is OCR coordinate data.  ... 
doi:10.5281/zenodo.1415562 fatcat:4hlvg7rx25br5d3ivlyprz25li

Enabling Search over Large Collections of Telugu Document Images – An Automatic Annotation Based Approach [chapter]

K. Pramod Sankar, C. V. Jawahar
2006 Lecture Notes in Computer Science  
For the first time, search is enabled over a massive collection of 21 Million word images from digitized document images.  ...  Character recognition based approaches yield poor results for developing search engines for Indian language document images, due to the complexity of the script and the poor quality of the documents.  ...  We demonstrate the power and scalability of our solution by creating a search engine over 500 books of Telugu language document images. The collection contained 75,000 pages with 21 million words.  ... 
doi:10.1007/11949619_75 fatcat:f2t7th6dtnfo5jwtnyaicwncly

KuroNet: Regularized Residual U-Nets for End-to-End Kuzushiji Character Recognition

Alex Lamb, Tarin Clanuwat, Asanobu Kitamoto
2020 SN Computer Science  
Over 3 million books on a diverse array of topics, such as literature, science, mathematics and even cooking are preserved.  ...  Our proposed model KuroNet (which builds on Clanuwat et al. in International conference on document analysis and recognition (ICDAR), 2019) outperforms other model for Kuzushiji recognition.  ...  Overall it has been estimated that there are over 3 million books preserved nationwide [4] .  ... 
doi:10.1007/s42979-020-00186-z fatcat:4e5bdbmvxzfpzagc2a7ayzgpse

Universal Digital Library—Future research directions

N. Balakrishnan
2005 Journal of Zhejiang University: Science A  
Other than the Digital Library of India Initiative which is part of the Million Books to the Web Project initiated by Prof Raj Reddy of Carnegie Mellon University, there are a few more initiatives in India  ...  This paper presents the future directions for the Digital Library of India Initiative both in terms of growing collection and the technical challenges in managing such large collection poses.  ...  Currently more than 120 000 books (around 50 million pages) have been scanned and most of them are available on the Web for free browsing.  ... 
doi:10.1631/jzus.2005.a1204 fatcat:ieoxjyhfwja5vj3ynqorwsfyyu

KuroNet: Pre-Modern Japanese Kuzushiji Character Recognition with Deep Learning [article]

Tarin Clanuwat, Alex Lamb, Asanobu Kitamoto
2019 arXiv   pre-print
Over 3 millions books on a diverse array of topics, such as literature, science, mathematics and even cooking are preserved.  ...  The result has been datasets with hundreds of millions of photographs of historical documents which can only be read by a small number of specially trained experts.  ...  For these reasons the vast majority of these books and documents have not yet been transcribed into modern Japanese characters. A.  ... 
arXiv:1910.09433v1 fatcat:ap7u6mnaabfxxdknkwauwynqhe

Page 678 of MH: Mental Hygiene Vol. 38, Issue 4 [page]

1954 MH: Mental Hygiene  
problems for millions of individuals.  ...  In both books there is also recognition that with aging there occur frustrations and deprivations that may be beyond individual control, to pose, not a problem of the aged for society, but rather personal  ... 

Nearest neighbor based collection OCR

Pramod Sankar K., C. V. Jawahar, R. Manmatha
2010 Proceedings of the 8th IAPR International Workshop on Document Analysis Systems - DAS '10  
We show from a selection of 33 Telugu books that starting with OCR labels for only 30% of the collection we can recognize the remaining 70% of the words in the collection with 70% accuracy using this approach  ...  Conventional optical character recognition (OCR) systems operate on individual characters and words, and do not normally exploit document or collection context.  ...  Manmatha was supported in part by the Center for Intelligent Information Retrieval and in part by NSF IIS-0910884.  ... 
doi:10.1145/1815330.1815357 dblp:conf/das/SankarJM10 fatcat:szhehqsdj5gyzareox2tzcd7fm

The Objectives and Activities of the Publishers Association's Serial Publishers Executive (SPE)

John Davies
1997 Serials: The Journal for the Serials Community  
The SPE aims to ensure that serial publishing is gim its rightful place and recognition in the scheme of things.  ...  Publishing turnover for academic and professional books in the United Kingdom was £694 million last year. Turnover for academic and professional journals was £626 million.  ...  It was apparent during the preparations for the Dearing submission that the market for books is much better documented than the market for journals.  ... 
doi:10.1629/1024 fatcat:7koqk55ggbao7gt5jyyd5ceuqq

A Smart Reader for Blind People

2019 International Journal of Engineering and Advanced Technology  
To read the text a human needs a vision. Survey conducted on several papers and systems provides hardware consisting of a camera interface with Raspberry Pi for processing the text.  ...  The raspberry pi makes use of Optical Character Recognition (OCR) software installed in it, to perform the conversion of an image to text and similarly text to speech conversion.  ...  Optical character recognition (OCR) is the technology used for translating a captured image of written text into machineencoded text.  ... 
doi:10.35940/ijeat.f1285.0986s319 fatcat:dwdoi73l7nf5rg46mj7syaseum

Estimating the Effects of Text Genre, Image Resolution and Algorithmic Complexity needed for Sinhala Optical Character Recognition

Isuri Anuradha, Chamila Liyanage, Ruvan Weerasinghe
2021 The International Journal on Advances in ICT for Emerging Regions  
While optical character recognition for Latin based scripts have seen near human quality performance, the accuracy for the rounded scripts of South Asia still lags behind.  ...  a realistic estimation of the complexity of recognizing the rounded script of Sinhala.  ...  ACKNOWLEDGMENT This work was carried out as a part of a project funded by Theekshana -Research and Development Company. We acknowledge Mrs.  ... 
doi:10.4038/icter.v14i3.7231 fatcat:zbq2kjrlnrepbew5zt4grl2vuy
« Previous Showing results 1 — 15 out of 113,076 results