A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Scalable Feature Extraction from Noisy Documents
2009
2009 10th International Conference on Document Analysis and Recognition
We cope with the metadata recognition in layoutoriented documents. We address the problem as a classification task and propose a method for automatic extraction of relevant features, in presence of content and structural noise, caused by scanning, OCR and segmentation problems. The method is based on the automatic analysis of documents and requires no particular preprocessing. The method mines the documents and determines frequent patterns, which are both literal patterns and their
doi:10.1109/icdar.2009.227
dblp:conf/icdar/LecerfC09
fatcat:cihmizehgbfnriu2nopv3a2xai