Semi-supervised Hashing with Semantic Confidence for Large Scale Visual Search
Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval - SIGIR '15
Similarity search is one of the fundamental problems for large scale multimedia applications. Hashing techniques, as one popular strategy, have been intensively investigated owing to the speed and memory efficiency. Recent research has shown that leveraging supervised information can lead to high quality hashing. However, most existing supervised methods learn hashing function by treating each training example equally while ignoring the different semantic degree related to the label, i.e.
... ic confidence, of different examples. In this paper, we propose a novel semi-supervised hashing framework by leveraging semantic confidence. Specifically, a confidence factor is first assigned to each example by neighbor voting and click count in the scenarios with label and click-through data, respectively. Then, the factor is incorporated into the pairwise and triplet relationship learning for hashing. Furthermore, the two learnt relationships are seamlessly encoded into semi-supervised hashing methods with pairwise and listwise supervision respectively, which are formulated as minimizing empirical error on the labeled data while maximizing the variance of hash bits or minimizing quantization loss over both the labeled and unlabeled data. In addition, the kernelized variant of semi-supervised hashing is also presented. We have conducted experiments on both CIFAR-10 (with label) and Clickture (with click data) image benchmarks (up to one million image examples), demonstrating that our approaches outperform the state-ofthe-art hashing techniques.