A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Boosting Cross-modal Retrieval with MVSE++ and Reciprocal Neighbors
2020
IEEE Access
In this paper, we propose to boost the cross-modal retrieval through mutually aligning images and captions on the aspects of both features and relationships. First, we propose a multi-feature based visualsemantic embedding (MVSE++) space to retrieve the candidates in another modality, which provides a more comprehensive representation of the visual content of objects and scene context in images. Thus, we have more potential to find a more accurate and detailed caption for the image. However,
doi:10.1109/access.2020.2992187
fatcat:dhtacsefubhyvopltkjtudqrgi