Audio keyword generation for sports video analysis

Min Xu, Ling-Yu Duan, Liang-Tien Chia, Chang-sheng Xu
2004 Proceedings of the 12th annual ACM international conference on Multimedia - MULTIMEDIA '04  
Semantic sports video analysis has attracted many research interests and audio cues have been shown to play an important role in semantics inference. To facilitate event detection using audio information, we have introduced the concept of audio keyword (e.g. excited/plain commentator speech, excited/plain audience sound, etc.) to describe the game-specific sound associated with an event. In our previous work, we have designed a hierarchical Support Vector Machine (SVM) classifier for audio
more » ... rd identification. However, there are two inherent weaknesses: 1) a frame-based SVM classifier does not incorporate any contextual information; 2) a robust recognizer relies on large amounts of training data in the case of different sports games videos. In this demo, we present a flexible Hidden Markov Model (HMM)-based audio keyword generation system. This is motivated by the successful story of applying HMM in speech recognition. Unlike the frame-based SVM classification followed by a major voting, our HMM-based system treats an audio keyword as a continuous time series data and employs hidden states transition to capture contexts. Moreover, our system introduces an adaptation mechanism to tune the initial HMM models (obtained from available training data) to improve performance by a small number of data from a new sports game video. Promising results has been demonstrated on the tennis, soccer and basketball videos with the total length of 2 hours.
doi:10.1145/1027527.1027702 dblp:conf/mm/XuDCX04 fatcat:xdznflw6qzeu3cfve4rgu43wie