Deep Attribute-preserving Metric Learning for Natural Language Object Retrieval

Jianan Li, Yunchao Wei, Xiaodan Liang, Fang Zhao, Jianshu Li, Tingfa Xu, Jiashi Feng
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
Retrieving image content with a natural language expression is an emerging interdisciplinary problem at the intersection of multimedia, natural language processing and artificial intelligence. Existing methods tackle this challenging problem by learning features from the visual and linguistic domains independently while the critical semantic correlations bridging two domains have been underexplored in the feature learning process. In this paper, we propose to exploit sharable semantic
more » ... as "anchors" to ensure the learned features are well aligned across domains for better object retrieval. We define "attributes" as the common concepts that are informative for object retrieval and can be easily learned from both visual content and language expression. In particular, diverse and complex attributes (e.g., location, color, category, interaction between object and context) are modeled and incorporated to promote cross-domain alignment for feature learning from multiple perspectives. Based on the sharable attributes, we propose a deep Attribute-Preserving Metric learning (AP-Metric) framework that jointly generates unique query-sensitive region proposals and conducts novel cross-modal feature learning that explicitly pursues consistency over semantic attribute abstraction within both domains for deep metric learning. Benefiting from the cross-modal semantic correlations, our proposed framework can localize challenging visual objects to match complex query expressions within cluttered background accurately. The overall framework is end-toend trainable. Extensive evaluations on popular datasets including ReferItGame [18] , RefCOCO, and RefCOCO+ [43] well demonstrate its superiority. Notably, it achieves state-of-the-art performance on the challenging ReferItGame dataset.
doi:10.1145/3123266.3123439 dblp:conf/mm/LiWLZLXF17 fatcat:rcirnxocmrbdbhogyuq4hmanji