64,251 Hits in 7.2 sec

Distance Based Strategy for Supervised Document Image Classification [chapter]

Fabien Carmagnac, Pierre Héroux, Éric Trupin
2004 Lecture Notes in Computer Science  
This paper deals with supervised document image classification. An original distance based strategy allows automatic feature selection.  ...  Each iteration of the classification algorithm computes the distance d between the image to be classified and the chosen representative.  ...  Our strategy simultaneously performs the feature selection for a given problem and the document classification. It is based on the computation of distance between documents.  ... 
doi:10.1007/978-3-540-27868-9_98 fatcat:4eurp6cftbb7jf34gdgctavyj4

IEEE Access Special Section Editorial: Data Mining and Granular Computing in Big Data and Knowledge Processing

Weiping Ding, Gary G. Yen, Gleb Beliakov, Isaac Triguero, Mahardhika Pratama, Xiangliang Zhang, Hongjun Li
2019 IEEE Access  
ontologies are built by reusing and adapting the existing public categories of Chinese judgment documents and the WMD-based similarity computation was made for KNN based document classification.  ...  Big data mining relies on distributed computational strategies; it is often impossible to store and process data on one single computing node.  ... 
doi:10.1109/access.2019.2908776 fatcat:7km2edtcuzeutnwy3pjbvg264e

A review of feature selection methods with applications

A. Jovic, K. Brkic, N. Bogunovic
2015 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)  
The usual applications of FS are in classification, clustering, and regression tasks. This review considers most of the commonly used FS techniques. Particular emphasis is on the application aspects.  ...  Since exhaustive search for optimal feature subset is infeasible in most cases, many search strategies have been proposed in literature.  ...  ACKNOWLEDGEMENTS This work has been supported in part by the Croatian Science Foundation, within the project "De-identification Methods for Soft and Non-Biometric Identifiers" (DeMSI, UIP- 11-2013-1544  ... 
doi:10.1109/mipro.2015.7160458 dblp:conf/mipro/JovicBB15 fatcat:hrqcsfltbzg4vnnwxy3wr3ju4a

Document image retrieval based on texture features and similarity fusion

Fahimeh Alaei, Alireza Alaei, Michael Blumenstein, Umapada Pal
2016 2016 International Conference on Image and Vision Computing New Zealand (IVCNZ)  
The similarity distances between each of the two feature vectors extracted for a given query and the feature vectors extracted from the document images in the training step are computed separately.  ...  The document images are finally ranked based on the greatest visual similarity to the query obtained from the fusion similarity measures.  ...  Creation of Knowledge-based Feature are the corresponding weights for the distances computed based on the classifiers.  ... 
doi:10.1109/ivcnz.2016.7804437 dblp:conf/ivcnz/AlaeiABP16 fatcat:oeqvvjrklbb6vo5eypkmfofcfq

A Graph Lattice Approach to Maintaining Dense Collections of Subgraphs as Image Features

Eric Saund
2011 2011 International Conference on Document Analysis and Recognition  
Document classification and indexing methods depend on having informative image features.  ...  Each feature is itself a subgraph, and a feature vector is a count of occurrences of subgraphs in the image.  ...  Thank you to Fang Liu for discussions and for building a preliminary implementation of the graph lattice machinery.  ... 
doi:10.1109/icdar.2011.216 dblp:conf/icdar/Saund11 fatcat:favkvvory5b5xphnqav5dvcrf4

Unsupervised Classification of Structurally Similar Document Images

Jayant Kumar, David Doermann
2013 2013 12th International Conference on Document Analysis and Recognition  
The approach is based on multiple levels of content and structure. At a local level, a bag-of-visual words based on SURF features provides an effective way of computing content similarity.  ...  In this paper, we present a learning based approach for computing structural similarities among document images for unsupervised exploration in large document collections.  ...  [15] proposed a measure based on minimum edit-distance.  ... 
doi:10.1109/icdar.2013.248 dblp:conf/icdar/KumarD13 fatcat:tqw2frz4qzcvndlkfewc2oa4ka

Logo Recognition Based on the Dempster-Shafer Fusion of Multiple Classifiers [chapter]

Mohammad Ali Bagheri, Qigang Gao, Sergio Escalera
2013 Lecture Notes in Computer Science  
In order to reduce recognition error, a powerful combination strategy based on the Dempster-Shafer theory is utilized to fuse the three classifiers trained on different sources of information.  ...  However, the potential improvement in classification through feature fusion by ensemble-based methods has remained unattended.  ...  The successful recognition of logos facilitates automatic classification of source documents, which is considered a key strategy for document image analysis and retrieval.  ... 
doi:10.1007/978-3-642-38457-8_1 fatcat:b3v5mmqnk5g27jog5seylp6tgm

kNN based image classification relying on local feature similarity

Giuseppe Amato, Fabrizio Falchi
2010 Proceedings of the Third International Conference on SImilarity Search and APplications - SISAP '10  
In this paper, we propose a novel image classification approach, derived from the kNN classification strategy, that is particularly suited to be used when classifying images described by local features  ...  than similarity between images, opening up new opportunities to investigate more efficient and effective strategies.  ...  Keypoints are selected by choosing the most stable points from a set of candidate location. Each keypoint in an image is associated with one or more orientations, based on local image gradients.  ... 
doi:10.1145/1862344.1862360 dblp:conf/sisap/AmatoF10 fatcat:lencmytdungxhjdn65oo4tf5j4

A Graph Lattice Approach to Maintaining and Learning Dense Collections of Subgraphs as Image Features

Eric Saund
2013 IEEE Transactions on Pattern Analysis and Machine Intelligence  
Effective object and scene classification and indexing depend on extraction of informative image features.  ...  Further performance gains are achieved on a more difficult dataset using a feature voting method and feature selection procedure.  ...  Appreciation is also due to Fang Liu for discussions and for building a preliminary implementation of the graph lattice machinery.  ... 
doi:10.1109/tpami.2012.267 pmid:23267200 fatcat:2324ywfd5fgx7lmaylfcsoexji

Machine Learning of Generalized Document Templates for Data Extraction [chapter]

Janusz Wnek
2002 Lecture Notes in Computer Science  
When comparing documents images based on visual similarity it is difficult to determine the correct scale and features for document representation.  ...  Feature selection is used to reduce the dimensionality and redundancy of the size distributions, while preserving the essence of the visual appearance of a document.  ...  Research in currently focused on feature selection strategies which also (re-)introduce spatial information into the size distribution representation.  ... 
doi:10.1007/3-540-45869-7_48 fatcat:eta3chv4yvbo3mp4wuwlnljiqq

SwiftLink: Serendipitous Navigation Strategy for Large-Scale Document Collections

Marc von Wyl, Stephane Marchand-Maillet
2012 2012 23rd International Workshop on Database and Expert Systems Applications  
The multiplication of large-scale document collections has created the need for robust and adaptive access strategies in many applicative areas.  ...  In this paper, we depart from the traditional document search paradigm to move onto the construction of a collection navigation strategy.  ...  EVALUATION STRATEGY AND EXPERIMENTS A. Dataset We consider images as documents here but our model readily applies on all types of documents, using appropriate features.  ... 
doi:10.1109/dexa.2012.52 dblp:conf/dexaw/WylM12 fatcat:qt4upuuxjfaytcnjfeyzshcbie

Automatic Document Logo Detection

G. Zhu, D. Doermann
2007 Proceedings of the International Conference on Document Analysis and Recognition  
At a coarse scale, a trained Fisher classifier performs initial classification using features from document context and connected components.  ...  In this paper, we propose a new approach to logo detection and extraction in document images that robustly classifies and precisely localizes logos using a boosting strategy across multiple image scales  ...  Figs. 2(b) and 2 (c) show computed grayscale blobs at scale level σ = 8 and 16, respectively. We select the initial coarse scale σ n based on the resolution of the input image.  ... 
doi:10.1109/icdar.2007.4377038 dblp:conf/icdar/ZhuD07 fatcat:zgqnkiyuevgtbgxmrm3bdgzso4

Asymmetric Learning and Dissimilarity Spaces for Content-Based Retrieval [chapter]

Eric Bruno, Nicolas Moenne-Loccoz, Stéphane Marchand-Maillet
2006 Lecture Notes in Computer Science  
The proposed approach is evaluated on both artificial data and real image database, and compared with stateof-the-art algorithms. ⋆ This work is funded by the Swiss NCCR (IM)2 (Interactive Multimodal Information  ...  This classification problem is known to be asymmetric, i.e. the negative class does not cluster in the original feature spaces.  ...  Image retrieval A last evaluation is conducted on a Corel image subset. The feature space consists in a 64 RGB histogram and embeds 18521 images annotated by several keywords.  ... 
doi:10.1007/11788034_34 fatcat:gkm2chboqbbuhbqyol4eelivpe

Fast Rule-Line Removal Using Integral Images and Support Vector Machines

Jayant Kumar, David Doermann
2011 2011 International Conference on Document Analysis and Recognition  
We use an integral-image representation which allows fast computation of features and apply techniques for large scale Support Vector learning using a data selection strategy to sample a small subset of  ...  In this paper, we present a fast and effective method for removing pre-printed rule-lines in handwritten document images.  ...  The main bottleneck of taking a pixel-based classification approach for rule-line removal is the feature computation and classification time for each pixel.  ... 
doi:10.1109/icdar.2011.123 dblp:conf/icdar/KumarD11 fatcat:iygkcvh3kvexfak7yswpski3ca

Comparing representative selection strategies for dissimilarity representations

Zane Reynolds, Horst Bunke, Mark Last, Abraham Kandel
2006 International Journal of Intelligent Systems  
Several alternative representative strategies are proposed and empirically evaluated on a set of term vectors constructed from HTML documents.  ...  , and when the representatives are selected randomly, the time required to create the embedded space is significantly reduced, also with a small penalty in accuracy.  ...  Note that the outlier-based selection strategies take longer than the random-based strategies, since they need to compute ½ n 2 document vector distances.  ... 
doi:10.1002/int.20180 fatcat:3yiu2ei3wvefpftevknz6cuaza
« Previous Showing results 1 — 15 out of 64,251 results