Exploring Multimodal Visual Features for Continuous Affect Recognition

Bo Sun, Siming Cao, Liandong Li, Jun He, Lejun Yu
2016 Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge - AVEC '16  
This paper presents our work in the Emotion Sub-Challenge of the 6 th Audio/Visual Emotion Challenge and Workshop (AVEC 2016), whose goal is to explore utilizing audio, visual and physiological signals to continuously predict the value of the emotion dimensions (arousal and valence). As visual features are very important in emotion recognition, we try a variety of handcrafted and deep visual features. For each video clip, besides the baseline features, we extract multi-scale Dense SIFT features
more » ... (MSDF), and some types of Convolutional neural networks (CNNs) features to recognize the expression phases of the current frame. We train linear Support Vector Regression (SVR) for every kind of features on the RECOLA dataset. Multimodal fusion of these modalities is then performed with a multiple linear regression model. The final Concordance Correlation Coefficient (CCC) we gained on the development set are 0.824 for arousal, and 0.718 for valence; and on the test set are 0.683 for arousal and 0.642 for valence.
doi:10.1145/2988257.2988270 dblp:conf/mm/SunCLHY16 fatcat:cxxsk3jw4zhipgmovqmgnmnuz4