The Perceived Similarity of Photos - A Test-Collection Based Evaluation Framework for the Content-Based Image Retrieval Algorithms 1
Content-based image retrieval (CBIR) algorithms have been seen as a promising access method for digital photo collections, sooner or later replacing the traditional text-based methods. Unfortunately, we have very little evidence of the usefulness of these algorithms in real user needs and contexts. One problem is that appropriately designed test collections are not available even for the basic performance testing of the CBIR algorithms. This paper proposes a task-oriented evaluation framework
... d an efficient procedure for constructing test collections for CBIR algorithms. First, the paper defines a plausible function for these algorithms in general purpose photo retrieval systems. We believe that the CBIR algorithms could be applied effectively in conjunction with text-based photo retrieval. Text-based methods are powerful in retrieving topically related items but do not support browsing. The CBIR algorithms could help in identifying visually similar photos within (often large) result sets of textual queries. The proposed evaluation framework is based on the concept of perceived similarity and emphasises the role of expertise and realistic illustration tasks as a premise of similarity assessments. A major innovation of the proposed test collection is that it consists of an array of small test sets each built up of a tiny database, a query photo, and respective similarity assessments. The approach supports testing of prototype CBIR algorithms in short development cycles. The empirical part of the paper reports how journalists were judging the similarity of photos while searching in the course of simulated, but realistic illustration tasks. The goal of the study was to exercise the construction process of the test collection. The results show that the task-oriented evaluation framework and the proposed procedures for constructing the test collection can be successfully applied. The lessons learned from the simulated illustration tasks, collection of similarity assessments and construction of the test collection are discussed.