Ensemble LUT classification for degraded document enhancement

Tayo Obafemi-Ajayi, Gady Agam, Ophir Frieder, Berrin A. Yanikoglu, Kathrin Berkner
2008 Document Recognition and Retrieval XV  
The fast evolution of scanning and computing technologies have led to the creation of large collections of scanned paper documents. Examples of such collections include historical collections, legal depositories, medical archives, and business archives. Moreover, in many situations such as legal litigation and security investigations scanned collections are being used to facilitate systematic exploration of the data. It is almost always the case that scanned documents suffer from some form of
more » ... gradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to estimate local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system we have labeled a subset of the Frieder diaries collection. 1 This labeled subset was then used to train an ensemble classifier. The component classifiers are based on lookup tables (LUT) in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient. Experimental evaluation results are provided using the Frieder diaries collection. 1
doi:10.1117/12.767120 dblp:conf/drr/Obafemi-AjayiAF08 fatcat:uophygx2lre5tijgzeexj56zwe