A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2004; you can also visit the original URL.
The file type is
We propose a method for constructing a vector for a document image to represent its content to facilitate text retrieval. The method is based on an N-Gram algorithm for text similarity measure based on the frequency of occurrence of n-character strings appearing in the electronic text. Instead of using ASCII values, the present study investigates the use of character images to obtain the document vector and has found promising results for use in our news article retrieval project.doi:10.1109/icpr.2000.902941 dblp:conf/icpr/YuT00 fatcat:75ftc4vlvnhuvp5qmja3bozhk4