A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
X-Linear Attention Networks for Image Captioning
2020
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Recent progress on fine-grained visual recognition and visual question answering has featured Bilinear Pooling, which effectively models the 2 nd order interactions across multi-modal inputs. Nevertheless, there has not been evidence in support of building such interactions concurrently with attention mechanism for image captioning. In this paper, we introduce a unified attention block -X-Linear attention block, that fully employs bilinear pooling to selectively capitalize on visual information
doi:10.1109/cvpr42600.2020.01098
dblp:conf/cvpr/PanYLM20
fatcat:nf5bki4675g5rn72nzyiraid6e