Detecting Prominent Content in Unstructured Audio using Intensity-based Attack/release Patterns
발생/소멸 패턴을 이용한 비정형 혼합 오디오의 주성분 검출

Samuel Kim
2013 Journal of the Institute of Electronics and Information Engineers  
Defining the concept of prominent audio content as the most informative audio content from the users' perspective within a given unstructured audio segment, we propose a simple but robust intensity-based attack/release pattern features to detect the prominent audio content. We also propose a web-based annotation procedure to retrieve users' subjective perception and annotated 18 hours of video clips across various genres, such as cartoon, movie, news, etc. The experiments with a linear
more » ... h a linear classification method whose models are trained for speech, music, and sound effect demonstrate promising -but varying across the genres of programs -results (e.g., 86.7% weighted accuracy for speech-oriented talk shows and 49.3% weighted accuracy for {action movies}).
doi:10.5573/ieek.2013.50.12.224 fatcat:b367aryc2jcedeyrmidglc33aa