A Two Stage Method for Bengali Text Extraction from Still Images Containing Text

Ankita Sikdar
2012 Computer Science & Information Technology (CS & IT)   unpublished
Bengali text data present in multimedia images having multiple content forms, such as still images and text, contain information that when extracted finds a lot of applications. The images can be of different types, where objects and text may be completely separated or overlapped or embedded in each other. The Bengali text can be of different shapes and sizes. Extraction of text from these types of images becomes challenging because the textual portion has to be correctly separated from the
more » ... of the background. The input image passes through two stages. The first step tries to locate the different components in the image using entropy filtering and the second stage distinguishes the components representing text from the non-textual components based on several features of Bengali text. The text thus obtained from the image can then be used in software such as Bengali OCR for character recognition.
doi:10.5121/csit.2012.2306 fatcat:fe3fgaothzbudgcjzbixzn7k5m