Learn from Unlabeled Videos for Near-duplicate Video Retrieval

Xiangteng He, Yulin Pan, Mingqian Tang, Yiliang Lv, Yuxin Peng
2022 Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval  
Near-duplicate video retrieval (NDVR) aims to find the copies or transformations of the query video from a massive video database. It plays an important role in many video related applications, including copyright protection, tracing, filtering and etc. Video representation and similarity search are crucial to any video retrieval system. To derive effective video representation, most video retrieval systems require a large amount of manually annotated data for training, making it costly
more » ... ent. In addition, most retrieval systems are based on frame-level features for video similarity searching, making it expensive both storage wise and search wise. To address the above issues, we propose a video representation learning (VRL) approach to effectively address the above shortcomings. It first effectively learns video representation from unlabeled videos via contrastive learning to avoid the expensive cost of manual annotation. Then, it exploits transformer structure to aggregate frame-level features into clip-level to reduce both storage space and search complexity. It can learn the complementary and discriminative information from the interactions among clip frames, as well as acquire the frame permutation and missing invariant ability to support more flexible retrieval manners. Comprehensive experiments on two challenging near-duplicate video retrieval datasets, namely FIVR-200K and SVD, verify the effectiveness of our proposed VRL approach, which achieves the best performance of video retrieval on accuracy and efficiency. CCS CONCEPTS • Information systems → Video search. †Equal contribution.
doi:10.1145/3477495.3532010 fatcat:iafkdqw725egfem7f5gghrvkcu