A fusion scheme of visual and auditory modalities for event detection in sports video

Min Xu, Ling-Yu Duan, Chang-Sheng Xu, Qi Tian
2003 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)  
In this paper, we propose an effective fusion scheme of visual and auditory modalities to detect events in sports video. The proposed scheme is built upon semantic shot classification, where we classify video shots into several major or interesting classes, each of which has clear semantic meanings. Among major shot classes we perform classification of the different auditory signal segments (i.e. silence, hitting ball, applause, commentator speech) with the goal of detecting events with strong
more » ... emantic meaning. For instance, for tennis video, we have identified five interesting events: serve, reserve, ace, return, and score. Since we have developed a unified framework for semantic shot classification in sports videos and a set of audio mid-level representation with supervised learning methods, the proposed fusion scheme can be easily adapted to a new sports game. We are extending this fusion scheme to three additional typical sports videos: basketball, volleyball and soccer. Correctly detected sports video events will greatly facilitate further structural and temporal analysis, such as sports video skimming, table of content, etc.
doi:10.1109/icme.2003.1220922 dblp:conf/icmcs/XuDXT03 fatcat:6gxfz64ayzb7nfg65445zjtwbm