A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Multimodal Convolutional Neural Networks for Matching Image and Sentence
2015
2015 IEEE International Conference on Computer Vision (ICCV)
In this paper, we propose multimodal convolutional neural networks (m-CNNs) for matching image and sentence. Our m-CNN provides an end-to-end framework with convolutional architectures to exploit image representation, word composition, and the matching relations between the two modalities. More specifically, it consists of one image CNN encoding the image content and one matching CNN modeling the joint representation of image and sentence. The matching CNN composes different semantic fragments
doi:10.1109/iccv.2015.301
dblp:conf/iccv/MaLSL15
fatcat:fv3kzu4iz5ghplyzwofob4revy