Recursive text segmentation for Indonesian Automated Document Reader for people with visual impairment

Teresa Vania Tjahja, Anto Satriyo Nugroho, James Purnama, Nur Aziza Azis, Rose Maulidiyatul Hikmah, Oskar Riandi, Bowo Prasetyo
2011 Proceedings of the 2011 International Conference on Electrical Engineering and Informatics  
This research is conducted to accommodate the needs of visually impaired people through an intelligent system, which reads textual information on papers and produces corresponding voice. Indonesian Automated Document Reader (I-ADR) is operated via a voice-based user interface to scan a document page. Textual information from the scanned page is then extracted using Optical Character Recognition (OCR) techniques. A user can then choose to have the system read the whole page, or they can opt to
more » ... sten to a summary of the information in page. SIDoBI (Sistem Ikhtisar Dokumen untuk Bahasa Indonesia) is integrated into the system to provide summarization feature. The result of either the whole-page reading or summarization is converted to speech through a textto-speech synthesizer. This whole system is developed under the Free Open Source Software policy and will be distributed openly to all users in need without any cost. This paper is focused on the text segmentation algorithm implemented in I-ADR to extract text from documents with complex layout. We implemented I-ADR text segmentation module using Enhanced CRLA and propose an improved algorithm for text extraction. Evaluation of the proposed system with various page layouts showed promising results.
doi:10.1109/iceei.2011.6021764 dblp:conf/iceei/TjahjaNPAHRP11 fatcat:iw5hvku7r5ffveu2kqkntclyea