A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit <a rel="external noopener" href="https://arxiv.org/pdf/1611.05490v1.pdf">the original URL</a>. The file type is <code>application/pdf</code>.
<span class="release-stage" >pre-print</span>
The "CNN-RNN" design pattern is increasingly widely applied in a variety of image annotation tasks including multi-label classification and captioning. Existing models use the weakly semantic CNN hidden layer or its transform as the image embedding that provides the interface between the CNN and RNN. This leaves the RNN overstretched with two jobs: predicting the visual concepts and modelling their correlations for generating structured annotation output. Importantly this makes the end-to-end<span class="external-identifiers"> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1611.05490v1">arXiv:1611.05490v1</a> <a target="_blank" rel="external noopener" href="https://fatcat.wiki/release/hedw7uovwjgftp2wwi4uvaoi7y">fatcat:hedw7uovwjgftp2wwi4uvaoi7y</a> </span>
more »... aining of the CNN and RNN slow and ineffective due to the difficulty of back propagating gradients through the RNN to train the CNN. We propose a simple modification to the design pattern that makes learning more effective and efficient. Specifically, we propose to use a semantically regularised embedding layer as the interface between the CNN and RNN. Regularising the interface can partially or completely decouple the learning problems, allowing each to be more effectively trained and jointly training much more efficient. Extensive experiments show that state-of-the art performance is achieved on multi-label classification as well as image captioning.
<a target="_blank" rel="noopener" href="https://web.archive.org/web/20191027150034/https://arxiv.org/pdf/1611.05490v1.pdf" title="fulltext PDF download" data-goatcounter-click="serp-fulltext" data-goatcounter-title="serp-fulltext"> <button class="ui simple right pointing dropdown compact black labeled icon button serp-button"> <i class="icon ia-icon"></i> Web Archive [PDF] <div class="menu fulltext-thumbnail"> <img src="https://blobs.fatcat.wiki/thumbnail/pdf/19/15/19158dfe2815e7f9eebc5822687e83d0a89ae147.180px.jpg" alt="fulltext thumbnail" loading="lazy"> </div> </button> </a> <a target="_blank" rel="external noopener" href="https://arxiv.org/abs/1611.05490v1" title="arxiv.org access"> <button class="ui compact blue labeled icon button serp-button"> <i class="file alternate outline icon"></i> arxiv.org </button> </a>