A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Exploring Optical-Flow-Guided Motion and Detection-Based Appearance for Temporal Sentence Grounding
[article]
2022
arXiv
pre-print
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query. Most previous works focus on learning frame-level features of each whole frame in the entire video, and directly match them with the textual information. Such frame-level feature extraction leads to the obstacles of these methods in distinguishing ambiguous video frames with complicated contents and subtle appearance differences, thus limiting their performance.
arXiv:2203.02966v1
fatcat:ith2bpxlgbhlpjy4bky2dnzbqa