Local Deep Descriptors in Bag-of-Words for Image Retrieval

Jiewei Cao, Zi Huang, Heng Tao Shen
2017 Proceedings of the on Thematic Workshops of ACM Multimedia 2017 - Thematic Workshops '17  
The Bag-of-Words (BoW) models using the SIFT descriptors have achieved great success in content-based image retrieval over the past decade. Recent studies show that the neuron activations of the convolutional neural networks (CNN) can be viewed as local descriptors, which can be aggregated into e ective global descriptors for image retrieval. However, little work has been done on using these local deep descriptors in BoW models, especially in the case of large visual vocabularies. In this
more » ... we provide the key ingredients to build an e ective BoW model using deep descriptors. Speci cally, we show how to use the CNN as a combination of local feature detector and extractor, without the need of feeding multiple image patches to the network. Moreover, we revisit the classic issues of BoW -including the burstiness and quantization error -in our scenario and improve the retrieval accuracy by addressing these problems. Lastly, we demonstrate that our model can scale up to large visual vocabularies, enjoying the advantages of both the sparseness of visual word histogram and the discriminative power of deep descriptor. Experiments show that our model achieves state-of-the-art performance on di erent datasets without re-ranking.
doi:10.1145/3126686.3127018 dblp:conf/mm/CaoHS17 fatcat:pudntrtbjnbzpmcsfj2ytplqga