Audio-based affect detection in web videos

Dave Chisholm, Behjat Siddiquie, Ajay Divakaran, Elizabeth Shriberg
2015 2015 IEEE International Conference on Multimedia and Expo (ICME)  
We present a new technique for detecting audio concepts in web content as well outline the technique's applications to video sequence parsing. Our focus is primarily on affective concepts and in order to study them we have collected a new dataset, consisting of videos where a speaker is persuading a crowd, called "Rallying a Crowd". We develop new classifiers for graded levels of arousal in speech as well as crowd noise and music and demonstrate their effectiveness on web content. These
more » ... es achieve high detection accuracy (58.2%) for affective concepts on this new dataset and outperform (36.8%) state-of-the-art techniques (33.1%) for semantic concepts on a previously collected dataset. We also develop a new audio sequence segmentation technique which enables us to rapidly classify subsections of test sequence audio into the aforementioned audio classes. We are thus able to robustly address the detection of affective concepts in highly variable web content as well as the computational challenge of quick classification so as to enable web scale processing.
doi:10.1109/icme.2015.7177525 dblp:conf/icmcs/ChisholmSDS15 fatcat:6nz6lnlopncwvmr4n2bskme7lq