A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Object Relational Graph With Teacher-Recommended Learning for Video Captioning
2020
2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Taking full advantage of the information from both vision and language is critical for the video captioning task. Existing models lack adequate visual representation due to the neglect of interaction between object, and sufficient training for content-related words due to long-tailed problems. In this paper, we propose a complete video captioning system including both a novel model and an effective training strategy. Specifically, we propose an object relational graph (ORG) based encoder, which
doi:10.1109/cvpr42600.2020.01329
dblp:conf/cvpr/ZhangSY0WHZ20
fatcat:vxegivprwvgdhotyspwl3zo5ne