A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Unsupervised image captioning with no annotations is an emerging challenge in computer vision, where the existing arts usually adopt GAN (Generative Adversarial Networks) models. In this paper, we propose a novel memory-based network rather than GAN, named Recurrent Relational Memory Network (R^2M). Unlike complicated and sensitive adversarial learning that non-ideally performs for long sentence generation, R^2M implements a concepts-to-sentence memory translator through two-stage memoryarXiv:2006.13611v1 fatcat:w5uwfq6tknevzin2zoxj5gpgy4