A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit <a rel="external noopener" href="http://downloads.hindawi.com/journals/mpe/2020/2503137.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
Abstraction and Association: Cross-Modal Retrieval Based on Consistency between Semantic Structures
<span title="2020-05-07">2020</span>
<i title="Hindawi Limited">
<a target="_blank" rel="noopener" href="https://fatcat.wiki/container/wpareqynwbgqdfodcyhh36aqaq" style="color: black;">Mathematical Problems in Engineering</a>
</i>
Cross-modal retrieval aims to find relevant data of different modalities, such as images and text. In order to bridge the modality gap, most existing methods require a lot of coupled sample pairs as training data. To reduce the demands for training data, we propose a cross-modal retrieval framework that utilizes both coupled and uncoupled samples. The framework consists of two parts: Abstraction that aims to provide high-level single-modal representations with uncoupled samples; then,
<span class="external-identifiers">
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1155/2020/2503137">doi:10.1155/2020/2503137</a>
<a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/37cxcuimjbfa7mdibjmeduztwe">fatcat:37cxcuimjbfa7mdibjmeduztwe</a>
</span>
more »
... n links different modalities through a few coupled training samples. Moreover, under this framework, we implement a cross-modal retrieval method based on the consistency between the semantic structure of multiple modalities. First, both images and text are represented with the semantic structure-based representation, which represents each sample as its similarity from the reference points that are generated from single-modal clustering. Then, the reference points of different modalities are aligned through an active learning strategy. Finally, the cross-modal similarity can be measured with the consistency between the semantic structures. The experiment results demonstrate that given proper abstraction of single-modal data, the relationship between different modalities can be simplified, and even limited coupled cross-modal training data are sufficient for satisfactory retrieval accuracy.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20200508163343/http://downloads.hindawi.com/journals/mpe/2020/2503137.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext">
<button class="ui simple right pointing dropdown compact black labeled icon button serp-button">
<i class="icon ia-icon"></i>
Web Archive
[PDF]
<div class="menu fulltext-thumbnail">
<img src="https://blobs.fatcat.wiki/thumbnail/pdf/b4/70/b470db23b669e7fa9b81214a1842288589f883a9.180px.jpg" alt="fulltext thumbnail" loading="lazy">
</div>
</button>
</a>
<a target="_blank" rel="external noopener noreferrer" href="https://doi.org/10.1155/2020/2503137">
<button class="ui left aligned compact blue labeled icon button serp-button">
<i class="unlock alternate icon" style="background-color: #fb971f;"></i>
hindawi.com
</button>
</a>