A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
[article]
2020
arXiv
pre-print
Video-grounded dialogues are very challenging due to (i) the complexity of videos which contain both spatial and temporal variations, and (ii) the complexity of user utterances which query different segments and/or different objects in videos over multiple dialogue turns. However, existing approaches to video-grounded dialogues often focus on superficial temporal-level visual cues, but neglect more fine-grained spatial signals from videos. To address this drawback, we propose Bi-directional
arXiv:2010.10095v1
fatcat:jiipwofx3fcvhh2vomtjrhlxrm