A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is
This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.arXiv:1604.06225v1 fatcat:5wunhbghdbcpbdhvlaildhzg7y