Key Event Detection in Video using ASR and Visual Data

Niraj Shrestha, Aparna N. Venkitasubramanian, Marie-Francine Moens
<span title="">2014</span> <i title="Dublin City University and the Association for Computational Linguistics"> <a target="_blank" rel="noopener" href="" style="color: black;">Proceedings of the Third Workshop on Vision and Language</a> </i> &nbsp;
Multimedia data grow day by day which makes it necessary to index them automatically and efficiently for fast retrieval, and more precisely to automatically index them with key events. In this paper, we present preliminary work on key event detection in British royal wedding videos using automatic speech recognition (ASR) and visual data. The system first automatically acquires key events of royal weddings from an external corpus such as Wikipedia, and then identifies those events in the ASR
more &raquo; ... a. The system also models name and face alignment to identify the persons involved in the wedding events. We compare the results obtained with the ASR output with results obtained with subtitles. The error is only slightly higher when using ASR output in the detection of key events and their participants in the wedding videos compared to the results obtained with subtitles. This work is licenced under a Creative Commons Attribution 4.0 International License. License details: http://
<span class="external-identifiers"> <a target="_blank" rel="external noopener noreferrer" href="">doi:10.3115/v1/w14-5407</a> <a target="_blank" rel="external noopener" href="">dblp:conf/acl-vl/ShresthaVM14</a> <a target="_blank" rel="external noopener" href="">fatcat:gjy447zno5etnjo23zdztlmkzi</a> </span>
