A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
A Hierarchical Multi-Modal Encoder for Moment Localization in Video Corpus
[article]
2020
arXiv
pre-print
Identifying a short segment in a long video that semantically matches a text query is a challenging task that has important application potentials in language-based video search, browsing, and navigation. Typical retrieval systems respond to a query with either a whole video or a pre-defined video segment, but it is challenging to localize undefined segments in untrimmed and unsegmented videos where exhaustively searching over all possible segments is intractable. The outstanding challenge is
arXiv:2011.09046v2
fatcat:xumwor3zgzferkj6kgcqsnreqy