2,086 Hits in 6.2 sec

Rule-based document structure understanding with a fuzzy combination of layout and textual features

Stefan Klink, Thomas Kieninger
2001 International Journal on Document Analysis and Recognition  
This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid in the sense, that it makes use of layout (geometrical) as well as textual features of a given document.  ...  Document image processing is a crucial process in the office automation and begins from the 'OCR' phase with difficulty of the document 'analysis' and 'understanding'.  ...  The rules contain textual and geometrical layout features.  ... 
doi:10.1007/pl00013570 fatcat:ljvftvyqpzchznrnnxn5zjvpsq

A survey of document image classification: problem statement, classifier architecture and performance evaluation

Nawei Chen, Dorothea Blostein
2006 International Journal on Document Analysis and Recognition  
Document image classification is an important step in Office  ...  Acknowledgements We gratefully acknowledge the financial support provided by the Xerox Foundation, and by NSERC, Canada's Natural Sciences and Engineering Research Council.  ...  Most of the surveyed systems use a combination of physical layout features and local image features; this provides a good characterization of structured images.  ... 
doi:10.1007/s10032-006-0020-2 fatcat:2ssef27glvh7dik37emkr4zpd4

Document understanding for a broad class of documents

Marco Aiello, Christof Monz, Leon Todoran, Marcel Worring
2002 International Journal on Document Analysis and Recognition  
We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents.  ...  All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis.  ...  Marco Aiello is supported in part by the Italian National Research Council (CNR), grant 203.15.10, and by the Univ. of Amsterdam.  ... 
doi:10.1007/s10032-002-0080-x fatcat:lwbfrujjwzcztfa2vsd2nadume

Modelling the retrieval of structured documents containing texts and images [chapter]

Carlo Meghini, Fabrizio Sebastiani, Umberto Straccia
1997 Lecture Notes in Computer Science  
The model thus combines the power of state-of-the-art document processing techniques with the advantages of a clean, logically defined framework for understanding multimedia document retrieval.  ...  A uniform and powerful query language allows queries to be issued that transparently combine features pertaining to form, content and structure alike.  ...  It is just natural, then, to allow our model to deal not only with the features of these sub-documents, but also with the way these are structured into a complex document.  ... 
doi:10.1007/bfb0026736 fatcat:dyvxrbdtrjbkrn43lkkl2rlfbe

Logical structure detection for heterogeneous document classes

Leon Todoran, Marco Aiello, Christof Monz, Marcel Worring, Paul B. Kantor, Daniel P. Lopresti, Jiangying Zhou
2000 Document Recognition and Retrieval VIII  
We present a fully implemented system based on generic document knowledge for detecting the logical structure of documents for which only general layout information is assumed.  ...  The prominent feature of our framework is its ability to handle documents from heterogeneous collections.  ...  Given the layout of a document, simply by using geometric information, font features and textual content, we are able to identify the logical structure with reasonable accuracy.  ... 
doi:10.1117/12.410827 dblp:conf/drr/Todoran0MW01 fatcat:kjvqp76ryja7ncofwc6mg7ldnu

Using colour information to understand censorship cards of film archives

Oronzo Altamura, Margherita Berardi, Michelangelo Ceci, Donato Malerba, Antonio Varlaro
2006 International Journal on Document Analysis and Recognition  
in all processing steps: namely, image segmentation, layout analysis, document image classification and understanding.  ...  Problems arise due to the low layout quality and standard of such material, which introduces a considerable amount of noise in its description.  ...  In particular, we intend to further enrich the representation language adopted to describe layout structures and to explore the opportunity of relaxing the definition of a subsumption test between clauses  ... 
doi:10.1007/s10032-006-0021-1 fatcat:6trl6lbhybe7zasgm3s2ntjuqu

Transductive Learning of Logical Structures from Document Images [chapter]

Michelangelo Ceci, Corrado Loglisci, Donato Malerba
2011 Studies in Computational Intelligence  
A fundamental task of document image understanding is to recognize semantically relevant components in the layout extracted from a document image.  ...  This contrasts with the more common situation in which we have only few labeled documents and an abundance of unlabeled ones.  ...  This work is partial fulfillment of the research objectives of the project "ATENEO 2009 -Estrazione, Rappresentazione e Analisi di Dati Complessi". The authors gratefully acknowledge Dr.  ... 
doi:10.1007/978-3-642-22913-8_6 fatcat:bcwi6xj4ivfdlebmd3ojiphtya

Using Fuzzy Logic to Leverage HTML Markup for Web Page Representation [article]

Alberto P. García-Plaza and Víctor Fresno and Raquel Martínez and Arkaitz Zubiaga
2016 arXiv   pre-print
We define a set of criteria to exploit the information provided by these page elements, and introduce a fuzzy combination of these criteria that we evaluate within the context of a web page clustering  ...  In this paper we introduce a fuzzy term weighing approach that makes the most of the HTML structure for document clustering.  ...  This work has been part-funded by the Spanish Ministry of Science and Innovation (MED-RECORD Project, TIN2013-46616-C2-2-R) and the PHEME FP7 project (grant No. 611233).  ... 
arXiv:1606.04429v1 fatcat:ktd6knlh55dkzafkiohblme46y

Scalable Feature Extraction from Noisy Documents

Loic Lecerf, Boris Chidlovskii
2009 2009 10th International Conference on Document Analysis and Recognition  
We address the problem as a classification task and propose a method for automatic extraction of relevant features, in presence of content and structural noise, caused by scanning, OCR and segmentation  ...  The method is based on the automatic analysis of documents and requires no particular preprocessing.  ...  This work is partially supported by the ATASH Project co-funded by the French Association on Research and Technology (ANRT).  ... 
doi:10.1109/icdar.2009.227 dblp:conf/icdar/LecerfC09 fatcat:cihmizehgbfnriu2nopv3a2xai

Relational Learning: Statistical Approach Versus Logical Approach in Document Image Understanding [chapter]

Michelangelo Ceci, Margherita Berardi, Donato Malerba
2005 Lecture Notes in Computer Science  
Document image understanding denotes the recognition of semantically relevant components in the layout extracted from a document image.  ...  Goal of this paper is to evaluate and systematically compare two different approaches to relational learning, that is, a statistical approach and a logical approach in the task of document image understanding  ...  Acknowledgments This work has been supported by the annual Scientific Research Project "Gestione dell'informazione non strutturata: modelli, metodi e architetture" Year 2005 funded by the University of  ... 
doi:10.1007/11558590_42 fatcat:yu6s6c665bgwrbyjpmu2of3zxu

A model of multimedia information retrieval

Carlo Meghini, Fabrizio Sebastiani, Umberto Straccia
2001 Journal of the ACM  
In this way, it reconciles similarity-based methods with semantic-based retrieval, providing the guidelines for the design of systems that are able to provide a generalized multimedia retrieval service  ...  The model is formulated in terms of a fuzzy description logic, which plays a twofold role: (1) it directly models semantic retrieval, and (2) it offers an ideal framework for the integration of the multimedia  ...  Finally, our thanks to Riccardo Marangone, who developed Arianna, and to Antonio Lopreiato, who developed the fuzzy ALCO theorem prover.  ... 
doi:10.1145/502102.502103 fatcat:jodoba3l2vazfivfgynaa7eaci

Detecting geographical references in the form of place names and associated spatial natural language

Jochen L. Leidner, Michael D. Lieberman
2011 SIGSPATIAL Special  
it enables the connection of the unstructured textual realm with the structured realm of Geographic Information Systems (GIS) [11] .  ...  For example, news stories about events happening in a particular location can be explored on a map for a spatial understanding of these events, as implemented by applications like the European Media Monitor  ...  Acknowledgments The second author was supported in part by the National Science Foundation under Grants IIS-10-18475, IIS-09-48548, IIS-08-12377, CCF-08-30618, and IIS-07-13501.  ... 
doi:10.1145/2047296.2047298 fatcat:r7qwbo7iazgt5mxbc6afhymtoa

Text Categorization Comparison between Simple BPNN and Combinatorial Method of LSI and BPNN

Hemlata Tekwani, Mahak Motwani
2014 International Journal of Computer Applications  
Singular value decomposition (SVD) technique is used in Latent semantic Analysis in which large term-document matrix is decomposed into a set of k orthogonal factors by which the original textual data  ...  The latent semantics demonstration is an accurate data structure in low-dimensional space in which documents, terms and queries are rooted and also compared.  ...  There are two categories of document image analysis which are Textual processing which deals with Text components and nontextual processing which deals with the non-text components of a document image.  ... 
doi:10.5120/17138-7723 fatcat:q3cnicxrsbhapmh7qlihvii7re

Parsing and interpreting ambiguous structures in spatial hypermedia

Luis Francisco-Revilla, Frank Shipman
2005 Proceedings of the sixteenth ACM conference on Hypertext and hypermedia - HYPERTEXT '05  
When reflecting on information, spatial hypermedia users express their understanding of the information's structure visually.  ...  An alternative approach that provides better support for ambiguity and adaptability is instantiated in FLAPS, an adaptive spatial parser that uses fuzzy-logic in order to infer the implicit structure of  ...  The inferences of all rules are then combined using an OR operation resulting in a new fuzzy conclusion. This inferred conclusion is then defuzzified into a crisp value.  ... 
doi:10.1145/1083356.1083376 dblp:conf/ht/Francisco-RevillaS05 fatcat:6trebl4muja47njbgrs2mc3syu

A comprehensive survey of mostly textual document segmentation algorithms since 2008

Sébastien Eskenazi, Petra Gomez-Krämer, Jean-Marc Ogier
2017 Pattern Recognition  
It provides a clear typology of documents and of document image segmentation algorithms.  ...  In document image analysis, segmentation is the task that identifies the regions of a document.  ...  this layout with a grammar, a set of rules or they assume that it is a Manhattan layout and use projection profiles.  ... 
doi:10.1016/j.patcog.2016.10.023 fatcat:brysckl6uben3nsed7ottldgs4
« Previous Showing results 1 — 15 out of 2,086 results