Cross-Modal Representation [chapter]

Zhiyuan Liu, Yankai Lin, Maosong Sun
2020 Representation Learning for Natural Language Processing  
Cross-modal representation learning is an essential part of representation learning, which aims to learn latent semantic representations for modalities including texts, audio, images, videos, etc. In this chapter, we first introduce typical cross-modal representation models. After that, we review several real-world applications related to cross-modal representation learning including image captioning, visual relation detection, and visual question answering.
doi:10.1007/978-981-15-5573-2_9 fatcat:duazhghcevejzd27ncvzdpshqq