Separability versus Prototypicality in Handwritten Word Retrieval

Jean-Paul van Oosten, Lambert Schomaker
2012 2012 International Conference on Frontiers in Handwriting Recognition  
User appreciation of a word-image retrieval system is based on the quality of a hit list for a query. Using support vector machines for ranking in large scale, handwritten document collections, we observed that many hit lists suffered from bad instances in the top ranks. An analysis of this problem revealed that two functions needed to be optimised concerning both separability and prototypicality. By ranking images in two stages, the number of distracting images is reduced, making the method
more » ... y convenient for massive scale, continuously trainable retrieval engines. Instead of cumbersome SVM training, we present a nearest-centroid method and show that precision improvements of up to 35 percentage points can be achieved, yielding up to 100% precision in data sets with a large amount of instances, while maintaining high recall performances.
doi:10.1109/icfhr.2012.269 dblp:conf/icfhr/OostenS12 fatcat:qz5x47kbi5gu3pyykvmpbx6qfi