Cross-media semantic representation via bi-directional learning to rank

Fei Wu, Xinyan Lu, Zhongfei Zhang, Shuicheng Yan, Yong Rui, Yueting Zhuang
2013 Proceedings of the 21st ACM international conference on Multimedia - MM '13  
In multimedia information retrieval, most classic approaches tend to represent different modalities of media in the same feature space. Existing approaches take either one-to-one paired data or uni-directional ranking examples (i.e., utilizing only text-query-image ranking examples or image-querytext ranking examples) as training examples, which do not make full use of bi-directional ranking examples (bi-directional ranking means that both text-query-image and image-querytext ranking examples
more » ... e utilized in the training period) to achieve a better performance. In this paper, we consider learning a cross-media representation model from the perspective of optimizing a listwise ranking problem while taking advantage of bi-directional ranking examples. We propose a general cross-media ranking algorithm to optimize the bi-directional listwise ranking loss with a latent space embedding, which we call Bi-directional Cross-Media Semantic Representation Model (Bi-CMSRM). The latent space embedding is discriminatively learned by the structural large margin learning for optimization with certain ranking criteria (mean average precision in this paper) directly. We evaluate Bi-CMSRM on the Wikipedia and NUS-WIDE datasets and show that the utilization of the bi-directional ranking examples achieves a much better performance than only using the uni-directional ranking examples.
doi:10.1145/2502081.2502097 dblp:conf/mm/WuLZYRZ13 fatcat:ni7x2naeavcgdalyqnrg4ug464