91,145 Hits in 4.2 sec

The lifecycle of a digital historical document

A. Antonacopoulos, D. Karatzas, H. Krawczyk, B. Wiszniewski
2004 Proceedings of the 2004 ACM symposium on Document engineering - DocEng '04  
This paper describes the lifecycle of a digital historical document, from template-based structure definition through to content extraction from the scanned pages and its final reconstitution as an electronic  ...  document (combining content and semantic information) along with the tools that have been created to realise each stage in the lifecycle.  ...  Content Extraction The content extraction phase aims at locating and understanding the information existing in the scanned document, in order to appropriately fill in the document model (XML file) for  ... 
doi:10.1145/1030397.1030427 dblp:conf/doceng/AntonacopoulosKKW04 fatcat:fuykgmdhnzdmrniyvb37rqcbxi


Shubham Nagmoti, Kapil Bhoyar, Shantanu Raut, Saransh Jamgade, Nikhil Mangrulkar, Aniket Pathade
2021 Journal of research in engineering and applied sciences  
In This paper we have proposed about document scanning in terms of a software interface i.e. web application that does an automated digitization of document with various features such as image enhancement  ...  Document scanning can be the way Unlike the traditional manual method of creating and preserving document which comes with many benefits such as more office space, information storing, sharing, better  ...  Introduction Document scanning has immensely evolved ever since digital era. It is important step or the first step in text recognition and image enhancement.  ... 
doi:10.46565/jreas.2021.v06i02.008 fatcat:5fq46lsglfhvhmj7xq2il7473e

Semantics-based content extraction in typewritten historical documents

A. Antonacopoulos, D. Karatzas
2005 Eighth International Conference on Document Analysis and Recognition (ICDAR'05)  
This paper presents a flexible approach to extracting content from scanned historical documents using semantic information.  ...  Results show that such a conversion strategy aided by (expert) user-specified semantic information and which enables the processing of individual parts of the document in a specialised way, produces superior  ...  Wiszniewski (partners in the MEMORIAL project) who were responsible for the nondocument image analysis aspects.  ... 
doi:10.1109/icdar.2005.215 dblp:conf/icdar/AntonacopoulosK05 fatcat:ezw4rc6bibforlpyumwdvcxoii

Improved Hybrid Binarization based on Kmeans for Heterogeneous document processing

Mahmoud Soua, Rostom Kachouri, Mohamed Akil
2015 2015 9th International Symposium on Image and Signal Processing and Analysis (ISPA)  
Nevertheless, in Heterogeneous documents, HBK ends up with some issues when extracting foreground text from complex background images.  ...  It handles effectively scanned documents which includes text on simple background.  ...  To deal with text extraction from complex background, documents can be enhanced to increase the image visibility and details which helps to distinguish characters.  ... 
doi:10.1109/ispa.2015.7306060 dblp:conf/ispa/SouaKA15 fatcat:py55ippqwngyzbh652lwt5w2ru

A Complete Approach to the Conversion of Typewritten Historical Documents for Digital Archives [chapter]

Apostolos Antonacopoulos, Dimosthenis Karatzas
2004 Lecture Notes in Computer Science  
This paper presents a complete system that historians/archivists can use to digitize whole collections of documents relating to personal information.  ...  The system integrates tools and processes that facilitate scanning, image indexing, document (physical and logical) structure definition, document image analysis, recognition, proofreading/correction and  ...  Region specifications are interpreted by the document analysis methods to extract precise text regions (e.g. textlines, table cells etc.) from the scanned document pages.  ... 
doi:10.1007/978-3-540-28640-0_9 fatcat:olce7jt4abgstkvs252f3l4zwy

Document Image Processing - A Review

Shazia Akram, Mehraj-Ud-Din Dar, Aasia Quyoum
2010 International Journal of Computer Applications  
Analysis of document images for information extraction has become very prominent in recent past.  ...  This article examines the various methods used for document image processing in order to achieve a processed document having high quality, accuracy and fast retrieval.  ...  in image of documents & to extract intended information from them.  ... 
doi:10.5120/1475-1991 fatcat:hbcl73h53fd3lhzhtcioht4wby

Re-typograph phase I: a proof-of-concept for typeface parameter extraction from historical documents

Bart Lamiroy, Thomas Bouville, Julien Blégean, Hongliu Cao, Salah Ghamizi, Romain Houpin, Matthias Lloyd, Eric K. Ringger, Bart Lamiroy
2015 Document Recognition and Retrieval XXII  
In our case, the main goal is to extract typographically significant information from scanned shapes of characters, and to reconstruct, to the best of our ability the overall font (or, if possible, typeface  ...  The current version operates in the following manner: 1. From a scanned original document, we use the Agora and Retro software * developed by Ramel et al.  ...  from scanned documents.  ... 
doi:10.1117/12.2075813 dblp:conf/drr/LamiroyBBCGHL15 fatcat:fppekszsizdsncpmdk4ghy2qpe

A Tool for Scanning Document-Images with a Photophone or a Digicam [chapter]

M. El Rhabi, A. Hakim, Z. Mahani, K. Messou, S. Saoud
2012 Communications in Computer and Information Science  
In this work, we propose a tool to scan a document-image acquired with a cameraphone. Firstly, we try to reduce the noise in the document-image.  ...  From this step, we can expect the document to a real quadrangle.  ...  to business critical information while enhancing the achitecture in place.  ... 
doi:10.1007/978-3-642-35594-3_45 fatcat:dwxg6q72qzbvbe57qgpcn4zgt4

Enhancing Open Data Knowledge by Extracting Tabular Data from Text Images

Andrei Puha, Octavian Rinciog, Vlad Posea
2018 Proceedings of the 7th International Conference on Data Science, Technology and Applications  
In this paper we present an algorithm which enhances nowadays knowledge by extracting tabular data from scanned pdf documents in an efficient way.  ...  After testing the proposed method on several low quality scanned pdf documents, it turned out that our methodology performs alike dedicated OCR paid software and we have integrated this algorithm as a  ...  /repo/sparql6 CONCLUSIONIn this paper we presented an algorithm which enhances nowadays knowledge by extracting tabular data from scanned pdf documents in an efficient way.  ... 
doi:10.5220/0006862402200228 dblp:conf/data/PuhaRP18 fatcat:c23kqyex7fflzedpjwn7mmaxby

Digitization and online availability of original collecting mission data to improve data quality and enhance the conservation and use of plant genetic resources

Imke Thormann, Hannes Gaisberger, Federico Mattei, Laura Snook, Elizabeth Arnaud
2012 Genetic Resources and Crop Evolution  
within the GPG2 activity 4.1 on completion of passport data entry to assess knowledge and gaps in the diversity and genetic quality of the collections.  ...  Acknowledgements The recently concluded World Bank-funded project, entitled "Collective Action for the Rehabilitation of Global Public Goods in the CGIAR Genetic Resources System: Phase 2" (GPG2), implemented  ...  Using this setting resulted in large file sizes and high-quality images.Each scanned page of a set of documents from a collecting mission was first saved as single uncompressed tagged image file (TIF)  ... 
doi:10.1007/s10722-012-9804-z fatcat:7kjvqcbgx5b3jnfhuwt3kqzjji

Data hiding in hard-copy text documents robust to print, scan and photocopy operations

Avinash L. Varna, Shantanu Rane, Anthony Vetro
2009 2009 IEEE International Conference on Acoustics, Speech and Signal Processing  
Using a simple correlation-based detector in conjunction with an error correction code, the hidden data can be extracted from a photocopy of the printed document.  ...  By enhancing the detector with an optical character recognition algorithm, the embedded data can be extracted even after multiple rounds of photocopying.  ...  Thus, we can successfully extract the embedded data from the printed document and the first copy in most cases.  ... 
doi:10.1109/icassp.2009.4959854 dblp:conf/icassp/VarnaRV09 fatcat:kzcueg2ygvcf3gqn6rp7ybpb2y


A. Karagianni
2021 The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
Digital processing of satellite imagery provided the extraction of additional enhanced data regarding the physiognomy of the surrounding area.  ...  Technological advances in the field of information acquisition have led to the development of various techniques regarding building documentation.  ...  Sidiropoulos for his valuable help in data collection.  ... 
doi:10.5194/isprs-archives-xlvi-m-1-2021-361-2021 fatcat:v4sikl6krbcltdmheru6hswy74

ROLE OF SCANNING AND OCR (Optical Character Recognization) IN DIGITIZATION

2019 Zenodo  
: ROLE of SCANNING and OCR in DIGITIZATION : .................................................................................................... 1.  ...  Now a days libraries are suffer from inadequate space problem due to overgrowing collection of printed documents day-by-day.  ...  Instead, OCR extracts relevant information and enters it automatically. The result is accurate, efficient information processing in less time.  ... 
doi:10.5281/zenodo.2558381 fatcat:otmqgk7lgncglkjwin3h3ghcqa

Research on document digitization processing technology

Ruili Zhang, Yanming Yang, Wenxiu Wang, J. Joo
2020 MATEC Web of Conferences  
The digitalization of document information is the development direction of the digitalization of document information management, which involves various technologies such as digitalization technology,  ...  Through the PDF document loading display, change the page replacement storage, the technical page jump to achieve the PDF document programming control.  ...  In general, we should try our best to achieve the goal that a lifetime use by one scan from the perspective of protecting the original document.  ... 
doi:10.1051/matecconf/202030902014 fatcat:6w424agckfeytcialteilmwccy

EMODnet Workshop on mechanisms and guidelines to mobilise historical data into biogeographic databases

Sarah Faulwetter, Evangelos Pafilis, Lucia Fanini, Nicolas Bailly, Donat Agosti, Christos Arvanitidis, Laura Boicenco, Terry Capatano, Simon Claus, Stefanie Dekeyzer, Teodor Georgiev, Aglaia Legaki (+9 others)
2016 Research Ideas and Outcomes  
The objective of Workpackage 4 of the European Marine Observation and Data network (E MODnet) is to fill spatial and temporal gaps in European marine species occurrence data availability by carrying out  ...  Photocopies or scanned documents often are of of low quality, not always OCRed, or the OCR text is of low quality.  ...  This prevents OCR software from correctly recognising certain characters in a scanned document (Fig. 3e) .  ... 
doi:10.3897/rio.2.e9774 fatcat:6kuvhwschjehbnwz2p3k4alu4q
« Previous Showing results 1 — 15 out of 91,145 results