Online Hashing for Scalable Remote Sensing Image Retrieval

Peng Li, Xiaoyu Zhang, Xiaobin Zhu, Peng Ren
2018 Remote Sensing  
Recently, hashing-based large-scale remote sensing (RS) image retrieval has attracted much attention. Many new hashing algorithms have been developed and successfully applied to fast RS image retrieval tasks. However, there exists an important problem rarely addressed in the research literature of RS image hashing. The RS images are practically produced in a streaming manner in many real-world applications, which means the data distribution keeps changing over time. Most existing RS image
more » ... g methods are batch-based models whose hash functions are learned once for all and kept fixed all the time. Therefore, the pre-trained hash functions might not fit the ever-growing new RS images. Moreover, the batch-based models have to load all the training images into memory for model learning, which consumes many computing and memory resources. To address the above deficiencies, we propose a new online hashing method, which learns and adapts its hashing functions with respect to the newly incoming RS images in terms of a novel online partial random learning scheme. Our hash model is updated in a sequential mode such that the representative power of the learned binary codes for RS images are improved accordingly. Moreover, benefiting from the online learning strategy, our proposed hashing approach is quite suitable for scalable real-world remote sensing image retrieval. Extensive experiments on two large-scale RS image databases under online setting demonstrated the efficacy and effectiveness of the proposed method. volume of remote sensing images is oversized. As an effective method to manage a large number of images, content-based image retrieval (CBIR) can retrieve the interesting images according to their visual content. In recent years, content-based RS image retrieval has been comprehensively studied [1] [2] [3] [4] , in which the similarity of RS images is measured by different kinds of visual descriptors. More specifically, local invariant [5], morphological [6], textural [7] [8] [9] , and data-driven features [10-13] have been evaluated in terms of content-based RS image retrieval tasks. To further improve image retrieval performance levels, Li et al. [14] proposed a multiple feature-based remote sensing image retrieval approach by combining handcrafted features and data-driven features via unsupervised feature learning. Wang et al. [15] proposed a multilayered graph model for hierarchically refining retrieval results from coarse to fine. Although some encouraging progress has been made, there remains a great challenge for the content-based RS image retrieval tasks. For the aforementioned visual descriptors, their dimensions can be in the hundreds or even thousands. Exhaustively comparing the high dimensional feature descriptor of an inquiry remote sensing image with each image in the retrieval set is computationally expensive and impossible to achieve on an oversized database. Besides, the storage of the image descriptors is also a bottleneck for large-scale RS image retrieval problems. Hashing technique is a potential solution to cope with big data retrieval due to its excellent ability in compact feature representation. The hashing methods map the input images from the high dimensional feature space to a low dimensional code space, i.e., hamming space, where each image is represented by a short binary code. It is extremely fast to perform image retrieval over such binary codes, because the hamming distance between binary codes can be efficiently calculated with XOR operation even in a modern CPU. Moreover, binary code representation significantly reduces the amount of memory required for storing the large-scale content information of images. Existing hashing approaches can be broadly categorized as data-independent and data-dependent schemes. Data-independent methods usually adopt random projections as hash functions without using any training data. One representative data-independent method is Locality Sensitive Hashing (LSH) [16] [17] [18] , which projects data points to a random hyperplane and then conducts random thresholding. Although this data-independent random scheme is quite computationally efficient, it usually cannot achieve satisfactory retrieved results because it totally disregards the image data structure. Moreover, to achieve a reasonable recall rate, the LSH based methods typically require long codes and multiple tables, which degrade the search efficiency in practice. On the contrary, data-dependent hashing methods attempt to learn good data-aware hash codes by utilizing various machine learning techniques, which are usually demonstrated to be more effective than data-independent LSH. Data-dependent hashing can further be divided into unsupervised hashing [19] [20] [21] [22] [23] and supervised hashing methods [24] [25] [26] [27] [28] [29] [30] . For example, spectral hashing [19] and Principal Component Analysis (PCA) based hashing methods [20] belong to the unsupervised category, which does not utilize the label information of training images when learning the binary codes. Supervised hashing approaches, such as kernel-based supervised hashing [25], supervised discrete hashing [27] and deep hashing methods [29], incorporate the label information to learn semantic hashing functions. Due to the great success of hashing in the field of natural image retrieval, many efforts have been devoted to develop efficient hashing methods for large-scale RS images retrieval tasks recently. More specifically, kernel-based nonlinear hashing was first introduced into the remote sensing community by Demir and Bruzzone [31]. Then, Li and Ren [32] proposed a novel unsupervised hashing method named partial randomness hashing (PRH) for efficient hash function construction. In [33], a novel large-scale RS image retrieval approach was proposed based on deep hashing neural networks under the supervision of labeled images. Ye et al. [34] proposed a multiple-feature learning framework for large-scale RS image hashing problem, which takes multiple complementary features as the input and learns the hybrid hash functions. Although the hashing-based RS image retrieval methods have achieved some improvements for large-scale applications, there exist two important problems that are rarely exploited in the existing RS
doi:10.3390/rs10050709 fatcat:2dbb6rhsb5agha4ve5e6t7dmj4