Multimodal and Multiresolution Depression Detection from Speech and Facial Landmark Features

Md Nasir, Arindam Jati, Prashanth Gurunath Shivakumar, Sandeep Nallan Chakravarthula, Panayiotis Georgiou
2016 Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge - AVEC '16  
Automatic classification of depression using audiovisual cues can help towards its objective diagnosis. In this paper, we present a multimodal depression classification system as a part of the 2016 Audio/Visual Emotion Challenge and Workshop (AVEC2016). We investigate a number of audio and video features for classification with different fusion techniques and temporal contexts. In the audio modality, Teager energy cepstral coefficients (TECC) outperform standard baseline features; while the
more » ... accuracy is achieved with i-vector modelling based on MFCC features. On the other hand, polynomial parameterization of facial landmark features achieves the best performance among all systems and outperforms the best baseline system as well. Keywords Multimodal signal processing, behavioral signal processing (BSP), depression, Teager energy operator, i-vector, facial landmark, fusion
doi:10.1145/2988257.2988261 dblp:conf/mm/NasirJSCG16 fatcat:33geyegcgfcclgnmufbc5f32iq