Small codes and large image databases for recognition

Antonio Torralba, Rob Fergus, Yair Weiss
<span title="">2008</span> <i title="IEEE"> <a target="_blank" rel="noopener" href="https://fatcat.wiki/container/ilwxppn4d5hizekyd3ndvy2mii" style="color: black;">2008 IEEE Conference on Computer Vision and Pattern Recognition</a> </i> &nbsp;
The Internet contains billions of images, freely available online. Methods for efficiently searching this incredibly rich resource are vital for a large number of applications. These include object recognition [2] , computer graphics [11, 27] , personal photo collections, online image search tools. In this paper, our goal is to develop efficient image search and scene matching techniques that are not only fast, but also require very little memory, enabling their use on standard hardware or even
more &raquo; ... on handheld devices. Our approach uses recently developed machine learning techniques to convert the Gist descriptor (a real valued vector that describes orientation energies at different scales and orientations within an image) to a compact binary code, with a few hundred bits per image. Using our scheme, it is possible to perform real-time searches with millions from the Internet using a single large PC and obtain recognition results comparable to the full descriptor. Using our codes on high quality labeled images from the LabelMe database gives surprisingly powerful recognition results using simple nearest neighbor techniques. Recent interest in object recognition has yielded a wide range of approaches to describing the contents of an image. One important application for this technology is the visual search of large collections of images, such as those on the Internet or on people's home computers. Accordingly, a number of recognition papers have explored this area. Nister and Stewenius demonstrate the real-time specific object recognition using a database of 40,000 images [19] ; Obdrzalek and Matas show sub-linear indexing time on the COIL dataset [20] . A common theme is the representation of the image as a collection of feature vectors and the use of efficient data structures to handle the large number of images. These ideas are common to many approaches in the content based image retrieval (CBIR) community, although the emphasis on really large datasets means that the chosen im-16 64 256 1024 24576 Figure 1. Short binary codes might be enough for recognition. This figure shows images reconstructed using an increasing number of bits and a compression algorithm similar to JPEG. The number on the left represents the number of bits used to compress each image. Reconstruction is done by adding a sparsity prior on image derivatives, which reduces typical JPEG artifacts. Many images are recognizable when compressed to have around 256-1024 bits. age representation is often relatively simple, e.g. color [6], wavelets [29] or crude segmentations [4]. The Cortina system [22] demonstrates real-time retrieval from a 10 million image collection, using a combination of texture and edge histogram features. See Datta et al. for a survey of such methods [5]. Our approach is based on binary codes for representing images and their neighborhood structure. Such codes have received limited attention in both the vision and CBIR communities. Ghosh et al. [7] use them to find duplicate images in a database. Binary codes have also been used to represent the color of an image [13, 18] . Landre et al. [14] use color, texture and shape cues in a 32-bit vector to perform retrieval on a 10,000 image dataset. These approaches also use manually designed descriptors, which in view of the tiny capacity of each code, is likely to be highly suboptimal particularly when the database is large, a scenario not investigated by any of these papers. Unlike CBIR we seek to recognize the objects present 1 978-1-4244-2243-2/08/$25.00 ©2008 IEEE
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/cvpr.2008.4587633">doi:10.1109/cvpr.2008.4587633</a> <a target="_blank" rel="external noopener" href="https://dblp.org/rec/conf/cvpr/TorralbaFW08.html">dblp:conf/cvpr/TorralbaFW08</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/47tl537tzjejldlmtokrqipkfq">fatcat:47tl537tzjejldlmtokrqipkfq</a> </span>
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20170808145837/http://mplab.ucsd.edu/wp-content/uploads/CVPR2008/Conference/data/papers/293.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/b4/99/b4994fd9dda664dfb22868c5eec3acf2dafcd056.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1109/cvpr.2008.4587633"> <button class="ui left aligned compact blue labeled icon button serp-button"> <i class="external alternate icon"></i> ieee.com </button> </a>