Annotating Historical Archives of Images

Xiaoyue Wang, Lexiang Ye, Eamonn J. Keogh, Christian R. Shelton
2010 International Journal of Digital Library Systems  
Recent initiatives like the Million Book Project and Google Print Library Project have already archived several million books in digital format, and within a few years a significant fraction of world' s books will be online. While the majority of the data will naturally be text, there will also be tens of millions of pages of images. Many of these images will defy automation annotation for the foreseeable future, but a considerable fraction of the images may be amiable to automatic annotation
more » ... algorithms that can link the historical image with a modern contemporary, with its attendant metatags. In order to perform this linking we must have a suitable distance measure which appropriately combines the relevant features of shape, color, texture and text. However the best combination of these features will vary from application to application and even from one manuscript to another. In this work we propose a simple technique to learn the distance measure by perturbing the training set in a principled way. We show the utility of our ideas on archives of manuscripts containing images from natural history and cultural artifacts.
doi:10.4018/jdls.2010040104 fatcat:xjvikgbmrfgblenfcgcq7svv5i