Deep Multi-level Semantic Hashing for Cross-modal Retrieval

Zhenyan Ji, Weina Yao, Wei Wei, Houbing Song, Huaiyu Pi
2019 IEEE Access  
With the rapid growth of multimodal data, the cross-modal search has widely attracted research interests. Due to its efficiency on storage and computing, hashing-based methods are broadly used for large scale cross-modal retrieval. Most existing hashing methods are designed based on binary supervision, which transforms complex relationships of multi-label data into simple similar or dissimilar. However, few methods have explored the rich semantic information implicit in multi-label data to
more » ... ve the accuracy of searching results. In this paper, the multi-level semantic supervision generating approach is proposed by exploring the label relevance. And a deep hashing framework is designed for multi-label image-text cross retrieval tasks. It can simultaneously capture the binary similarity and the complex multi-level semantic structure of data in different forms. Moreover, the effects of three different convolutional neural networks, CNN-F, VGG-16, and ResNet-50, on the retrieval results are compared. The experimental results on an open source cross-modal dataset show that our approach outperforms several state-of-the-art hashing methods, and the retrieval result on the CNN-F network is better than VGG-16 and ResNet-50. INDEX TERMS Cross-modal retrieval, deep learning, hashing method, multi-label learning.
doi:10.1109/access.2019.2899536 fatcat:xynopqlgyfhe3ef6su55zqczim