Future-Supervised Retrieval of Unseen Queries for Live Video

Spencer Cappallo, Cees G.M. Snoek
2017 Proceedings of the 2017 ACM on Multimedia Conference - MM '17  
Live streaming video presents new challenges for retrieval and content understanding. Its live nature means that video representations should be relevant to current content, and not necessarily to past content. We investigate retrieval of previously unseen queries for live video content. Drawing from existing whole-video techniques, we focus on adapting image-trained semantic models to the video domain. We introduce the use of future frame representations as a supervision signal for learning
more » ... porally aware semantic representations on unlabeled video data. Additionally, we introduce an approach for broadening a query's representation within a preconstructed semantic space, with the aim of increasing overlap between embedded visual semantics and the query semantics. We demonstrate the efficacy of these contributions for unseen query retrieval on live videos. We further explore their applicability to tasks such as no example, whole-video action classification and no-example live video action prediction, and demonstrate state of the art results.
doi:10.1145/3123266.3123437 dblp:conf/mm/CappalloS17 fatcat:t3wgpjthpfhnnbq6eeu56flsyu