Filters








3,773 Hits in 4.9 sec

Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization [article]

Weihong Ma, Hesuo Zhang, Lianwen Jin, Sihang Wu, Jiapeng Wang, Yongpan Wang
2020 arXiv   pre-print
We then use Hough transform for line detection on the binary mask and combine character results with the layout information to restore document content.  ...  In this paper, we propose an end-to-end trainable framework for restoring historical documents content that follows the correct reading order.  ...  In general, a document digitization system consists of two principal stages: layout analysis and text recognition.  ... 
arXiv:2007.06890v1 fatcat:rb3re4ajxvgg5dq7gl7mvbovoi

Segmentation and Recognition for Historical Tibetan Document Images

Longlong Ma, Congjun Long, Lijuan Duan, Xiqun Zhang, Yanxing Li, Quanchao Zhao
2020 IEEE Access  
This paper proposes an overall segmentation and recognition framework for historical Tibetan document images.  ...  These documents are converted into digital form using Tibetan document segmentation and recognition methods.  ...  LAYOUT SEGMENTATION Layout segmentation is an important step in the automatic digitization of historical documents.  ... 
doi:10.1109/access.2020.2975023 fatcat:qlvqseky65d37jfcp6hv65whcq

Document Analysis Systems for Digital Libraries: Challenges and Opportunities [chapter]

Henry S. Baird, Venugopal Govindaraju, Daniel P. Lopresti
2004 Lecture Notes in Computer Science  
Implications of technical demands made within digital libraries (DL's) for document image analysis systems are discussed.  ...  The state-of-the-art is summarized, including a digest of themes that emerged during the recent International Workshop on Document Image Analysis for Libraries.  ...  Layout Analysis and Meta-data Extraction. Layout analysis and metadata extraction is a crucial step in creating an information base for historical DL's.  ... 
doi:10.1007/978-3-540-28640-0_1 fatcat:3szb2elcm5amvlhvma3kbwmzza

You Actually Look Twice At it (YALTAi): using an object detection approach instead of region segmentation within the Kraken engine [article]

Thibault Clérice
2022 arXiv   pre-print
Layout Analysis (the identification of zones and their classification) is the first step along line segmentation in Optical Character Recognition and similar tasks.  ...  We show that most segmenters focus on pixel classification and that polygonization of this output has not been used as a target for the latest competition on historical document (ICDAR 2017 and onwards  ...  However, historical document layout analysis competition saw the light of the day in 2011, as a joint venture from ICDAR2011 and HIP2011.  ... 
arXiv:2207.11230v1 fatcat:ubc7ebgi5zdh3mxnsnmo6rfw24

Editorial for special issue on "Advanced Topics in Document Analysis and Recognition"

Cheng-Lin Liu, Andreas Dengel, Rafael Dueire Lins
2019 International Journal on Document Analysis and Recognition  
method for historical documents.  ...  The technology of Document Analysis and Recognition, as a subfield of pattern recognition, faces many application needs in the real world, such as the digitization of books, newspapers and archives, invoices  ... 
doi:10.1007/s10032-019-00342-z fatcat:t4f7ruobrbf47hvudczsrg4r2q

Word-Based Adaptive OCR for Historical Books

Vladimir Kluzner, Asaf Tzadok, Yuval Shimony, Eugene Walach, Apostolos Antonacopoulos
2009 2009 10th International Conference on Document Analysis and Recognition  
The aim of this work is to propose a new approach to the recognition of historical texts by providing an adaptive mechanism that automatically tunes itself to a specific book.  ...  The paper describes the architecture of such a system and new algorithms that have been developed for robust word image comparison (including registration, optical flow based distortion compensation, and  ...  Acknowledgements: The authors thank Tal Drory and Ami Ben-Horesh of the IBM Haifa and Gūnter Mūhlberger of Innsbruck University for many valuable discussions.  ... 
doi:10.1109/icdar.2009.133 dblp:conf/icdar/KluznerTSWA09 fatcat:56ih3jmipzcihn6cmykbt4t7ia

Digital Palaeography: New Machines and Old Texts (Dagstuhl Seminar 14302)

Tal Hassner, Robert Sablatnig, Dominique Stutzmann, Ségolène Tarte, Marc Herbstritt
2014 Dagstuhl Reports  
developed in Computer Vision for the analysis of digital images.  ...  This report documents the program and the outcomes of Dagstuhl Seminar 14302 "Digital Palaeography: New Machines and Old Texts", which focused on the interaction of Palaeography and computerized tools  ...  The research leading to these results has received funding from the Agence Nationale de la Recherche and Cap Digital under grant agreement no. ANR-12-CORP-0010.  ... 
doi:10.4230/dagrep.4.7.112 dblp:journals/dagstuhl-reports/HassnerSST14 fatcat:3xnz4ouqljcwpmjxbs6bislute

A brief review of document image retrieval methods: Recent advances

Fahimeh Alaei, Alireza Alaei, Michael Blumenstein, Umapada Pal
2016 2016 International Joint Conference on Neural Networks (IJCNN)  
Due to the rapid increase of different digitized documents, the development of a system to automatically retrieve document images from a large collection of structured and unstructured document images  ...  Many techniques have been developed to provide an efficient and effective way for retrieving and organizing these document images in the literature.  ...  Optical Character Recognition (OCR) is a traditional textual recognition method used for retrieval.  ... 
doi:10.1109/ijcnn.2016.7727648 dblp:conf/ijcnn/AlaeiABP16 fatcat:5tzfmk55r5hmpa3tnhcj3chuji

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers [article]

Raphaël Barman, Maud Ehrmann, Simon Clematide, Sofia Ares Oliveira, Frédéric Kaplan
2020 arXiv   pre-print
Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis.  ...  The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration.  ...  ACKNOWLEDGMENTS We warmly thank the journal Le Temps and the Swiss and Luxembourgish National Libraries for giving us access to their newspaper archive collections in the context of the impresso project  ... 
arXiv:2002.06144v4 fatcat:43kx7oyorbaqrbni4gvqeetygy

Open source optical character recognition for historical research

Tobias Blanke, Michael Bryant, Mark Hedges
2012 Journal of Documentation  
pre-processing and layout analysis.  ...  We present two of our case studies, which demonstrate how this can be achieved and how OCR can be embedded into wider digitally-enabled historical research.  ...  characters with overlapping descenders and ascenders, all of which is common for historical documents.  ... 
doi:10.1108/00220411211256021 fatcat:bq3j2npcfzburmpqg5mgs4qyy4

Segmentation of historical machine-printed documents using Adaptive Run Length Smoothing and skeleton segmentation paths

Nikos Nikolaou, Michael Makridis, Basilis Gatos, Nikolaos Stamatopoulos, Nikos Papamarkos
2010 Image and Vision Computing  
, (ii) detection of noisy areas and punctuation marks that are usual in historical machine-printed documents, (iii) detection of possible obstacles formed from background areas in order to separate neighboring  ...  In this paper, we strive towards the development of efficient techniques in order to segment document pages resulting from the digitization of historical machine-printed sources.  ...  [1] where page layout analysis is performed on historical printed books.  ... 
doi:10.1016/j.imavis.2009.09.013 fatcat:iqdu2sjlgngwfe4pv7ki76vok4

Combining Visual and Textual Features for Semantic Segmentation of Historical Newspapers

Barman, Ehrmann, Clematide, Ares Oliveira, Kaplan
2021 Zenodo  
Research work seeking to automatically process facsimiles and extract information thereby are multiplying with, as a first essential step, document layout analysis.  ...  The massive amounts of digitized historical documents acquired over the last decades naturally lend themselves to automatic processing and exploration.  ...  ACKNOWLEDGMENTS We warmly thank the journal Le Temps and the Swiss and Luxembourgish National Libraries for giving us access to their newspaper archive collections in the context of the impresso project  ... 
doi:10.5281/zenodo.4065270 fatcat:zwymvcvxofb3fpwgasmvo54r6i

Image analysis for digital media applications

Hong Yan
2001 IEEE Computer Graphics and Applications  
Acknowledgment Our work on cartoon image analysis, handwriting recognition, and document image compression is supported by several grants from the Australian Research Council.  ...  Many research staff and postgraduate students of the Image Processing Lab at the University of Sydney and the Signal Processing Lab at the City University of Hong Kong have worked on these projects and  ...  The solution to this complicated problem requires layout analysis combined with character recognition. Character recognition.  ... 
doi:10.1109/38.895126 fatcat:pfhnkx3zyrhtpmpwchb7dhg4ym

Document AI: Benchmarks, Models and Applications [article]

Lei Cui, Yiheng Xu, Tengchao Lv, Furu Wei
2021 arXiv   pre-print
In recent years, the popularity of deep learning technology has greatly advanced the development of Document AI, such as document layout analysis, visual information extraction, document visual question  ...  Document AI, or Document Intelligence, is a relatively new research topic that refers to the techniques for automatically reading, understanding, and analyzing business documents.  ...  Document layout analysis can essentially be regarded as an object detection task for document images.  ... 
arXiv:2111.08609v1 fatcat:7mg67htkgbgyjg63hlegd32m24

Logical segmentation for article extraction in digitized old newspapers

Thomas Palfray, David Hebert, Stéphane Nicolas, Pierrick Tranouez, Thierry Paquet
2012 Proceedings of the 2012 ACM symposium on Document engineering - DocEng '12  
CRF with multi-scale quantization feature functions Application to structure extraction in old newspaper », Document Analysis and Recognition (ICDAR), 2011 International Conference on, IEEE, p. 493-497  ...  character", "title inter-character" and "title inter-word") • text line (compound by the labels "character", "intercharacter" and "inter-word") • noise • background Each image is analysed line by line  ... 
doi:10.1145/2361354.2361383 dblp:conf/doceng/PalfrayHNTP12 fatcat:mqwwwfu2lrhc3ftqx3v66kfrqe
« Previous Showing results 1 — 15 out of 3,773 results