MMToC

Arijit Biswas, Ankit Gandhi, Om Deshmukh
2015 Proceedings of the 23rd ACM international conference on Multimedia - MM '15  
In this paper we propose a multimodal method called MM-ToC for automatically creating a table of content for educational videos. MMToC defines and quantifies word saliency for visual words extracted from the slides and spoken words obtained from the speech transcript. The saliency scores from these two modalities are combined to obtain a ranked list of salient words. These ranked words along with their saliency scores are used to formulate a topic segmentation cost function. The cost function
more » ... optimized using a dynamic program framework to obtain the topic segments of the video. These segments are labelled with their corresponding topic names for creating the table of content. We perform experiments on 24 hours of lectures spread across 23 videos ranging over 20-75 minutes duration each. We compare the proposed method with LDA-based video segmentation approaches and show that the proposed MMToC method is significantly better (F-score improvement of 0.19 and 0.24 on two datasets). We also perform a user study to demonstrate the effectiveness of MMToC for navigating educational videos. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. MM'15, October 26 -30, 2015,
doi:10.1145/2733373.2806253 dblp:conf/mm/BiswasGD15 fatcat:xmqink3mfzbffetsby2yqsc4ue