MuLER: Multiplet-Loss for Emotion Recognition

Anwer Slimi, Mounir Zrigui, Henri Nicolas
2022 Proceedings of the 2022 International Conference on Multimedia Retrieval  
With the rise of human-machine interactions, it has become necessary for machines to better understand humans in order to respond appropriately. Hence, in order to increase communication and interaction, it would be ideal for machines to automatically detect human emotions. Speech Emotion Recognition (SER) has been the focus of a lot of studies in the past few years. However, they can be considered poor in accuracy and must be improved. In our work, we propose a new loss function that aims to
more » ... code speeches instead of classifying them directly as the majority of the existing models do. The encoding will be done in a way that utterances with the same labels will have similar encodings. The encoded speeches were tested on two datasets, and we managed to get 88.19% accuracy with the RAVDESS (Ryerson Audiovisual Database of Emotional Speech and Song) dataset and 91.66% accuracy with the RML (Ryerson Multimedia Research Lab) dataset. CCS CONCEPTS • Computing methodologies → Machine learning; Neural networks.
doi:10.1145/3512527.3531406 fatcat:5rntsyduxba63jhq74wlg4tuea