Video Summarization Using Highlight Detection and Pairwise Deep Ranking Model

M. Sridevi, Mayuri Kharde
2020 Procedia Computer Science  
With mobile phones and camera enabled devices becoming pervasive and user-friendly, a large number of videos are being shot every day and uploaded to social media and video streaming websites. This makes them an important information dispensing tool. Searching and analysing such large amount of videos is an extremely tedious task. Thus, automatic video summarization is used to produce a short, informative summary of a long video, which is useful in the indexing and the classification of such
more » ... eos in the video database. Video summarization is a very challenging task. This work aims to generate a video summary, by modelling a two stream architecture consisting of deep convolutional neural network in each stream for extracting both spatial and temporal information of a video. Two dimensional Convolutional Neural Network (2D CNN) is used to exploit spatial information whereas a three dimensional Convolutional Neural Network (3D CNN) is used to exploit temporal information to generate highlight scores for segments of the video. The segment scores from each stream are fused to detect highlight segments from the video. Furthermore, as the highlight result depicts only a relative level of interest of a user in a video, the DCNN in each stream is trained with a pairwise deep ranking model. The goal is making the highlight score of highlight segment higher than non-highlight segment by optimizing the model. The obtained highlight segments are then used to summarize a video.
doi:10.1016/j.procs.2020.03.203 fatcat:n3h32rudx5d7rkm5orazo6hqjy