Multi-Modal Knowledge Representation Learning via Webly-Supervised Relationships Mining

Fudong Nian, Bing-Kun Bao, Teng Li, Changsheng Xu
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
Knowledge representation learning (KRL) encodes enormous structured information with entities and relations into a continuous low-dimensional semantic space. Most conventional methods solely focus on learning knowledge representation from single modality, yet neglect the complementary information from others. The more and more rich available multi-modal data on Internet also drive us to explore a novel approach for KRL in multi-modal way, and overcome the limitations of previous single-modal
more » ... ed methods. This paper proposes a novel multi-modal knowledge representation learning (MM-KRL) framework which attempts to handle knowledge from both textual and visual modal web data. It consists of two stages, i.e., webly-supervised multi-modal relationship mining, and bi-enhanced cross-modal knowledge representation learning. Compared with existing knowledge representation methods, our framework has several advantages: (1) It can e ectively mine multimodal knowledge with structured textual and visual relationships from web automatically. (2) It is able to learn a common knowledge space which is independent to both task and modality by the proposed Bi-enhanced Cross-modal Deep Neural Network (BC-DNN). (3) It has the ability to represent unseen multi-modal relationships by transferring the learned knowledge with isolated seen entities and relations into unseen relationships. We build a large-scale multimodal relationship dataset (MMR-D) and the experimental results show that our framework achieves excellent performance in zeroshot multi-modal retrieval and visual relationship recognition.
doi:10.1145/3123266.3123443 dblp:conf/mm/NianBLX17 fatcat:e5wyg4iykzgexcb6vohf2okshm