Multi-modal Dimensional Emotion Recognition using Recurrent Neural Networks

Shizhe Chen, Qin Jin
2015 Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge - AVEC '15  
Emotion recognition has been an active research area with both wide applications and big challenges. This paper presents our effort for the Audio/Visual Emotion Challenge (AVEC2015), whose goal is to explore utilizing audio, visual and physiological signals to continuously predict the value of the emotion dimensions (arousal and valence). Our system applies the Recurrent Neural Networks (RNN) to model temporal information. We explore various aspects to improve the prediction performance
more » ... g: the dominant modalities for arousal and valence prediction, duration of features, novel loss functions, directions of Long Short Term Memory (LSTM), multi-task learning, different structures for early feature fusion and late fusion. Best settings are chosen according to the performance on the development set. Competitive experimental results compared with the baseline show the effectiveness of the proposed methods.
doi:10.1145/2808196.2811638 dblp:conf/mm/ChenJ15 fatcat:rxqecowahfalln6bcmo77x6ryq