A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Geometry Attention Transformer with Position-aware LSTMs for Image Captioning
[article]
2021
arXiv
pre-print
In recent years, transformer structures have been widely applied in image captioning with impressive performance. For good captioning results, the geometry and position relations of different visual objects are often thought of as crucial information. Aiming to further promote image captioning by transformers, this paper proposes an improved Geometry Attention Transformer (GAT) model. In order to further leverage geometric information, two novel geometry-aware architectures are designed
arXiv:2110.00335v1
fatcat:emucqxpc3rdfpeu3xoewpwuj2i