A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
In recent years, transformer structures have been widely applied in image captioning with impressive performance. For good captioning results, the geometry and position relations of different visual objects are often thought of as crucial information. Aiming to further promote image captioning by transformers, this paper proposes an improved Geometry Attention Transformer (GAT) model. In order to further leverage geometric information, two novel geometry-aware architectures are designedarXiv:2110.00335v1 fatcat:emucqxpc3rdfpeu3xoewpwuj2i