Multi-task deep visual-semantic embedding for video thumbnail selection

Wu Liu, Tao Mei, Yongdong Zhang, Cherry Che, Jiebo Luo
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
Given the tremendous growth of online videos, video thumbnail, as the common visualization form of video content, is becoming increasingly important to influence user's browsing and searching experience. However, conventional methods for video thumbnail selection often fail to produce satisfying results as they ignore the side semantic information (e.g., title, description, and query) associated with the video. As a result, the selected thumbnail cannot always represent video semantics and the
more » ... lick-through rate is adversely affected even when the retrieved videos are relevant. In this paper, we have developed a multi-task deep visualsemantic embedding model, which can automatically select query-dependent video thumbnails according to both visual and side information. Different from most existing methods, the proposed approach employs the deep visual-semantic embedding model to directly compute the similarity between the query and video thumbnails by mapping them into a common latent semantic space, where even unseen querythumbnail pairs can be correctly matched. In particular, we train the embedding model by exploring the large-scale and freely accessible click-through video and image data, as well as employing a multi-task learning strategy to holistically exploit the query-thumbnail relevance from these two highly related datasets. Finally, a thumbnail is selected by fusing both the representative and query relevance scores. The evaluations on 1,000 query-thumbnail dataset labeled by 191 workers in Amazon Mechanical Turk have demonstrated the effectiveness of our proposed method.
doi:10.1109/cvpr.2015.7298994 dblp:conf/cvpr/LiuMZCL15 fatcat:74q724lavzgvrannwyvkiuhekq