Improving Emotion Classification through Variational Inference of Latent Variables

Srinivas Parthasarathy, Viktor Rozgic, Ming Sun, Chao Wang
2019 ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Conventional models for emotion recognition from speech signal are trained in supervised fashion using speech utterances with emotion labels. In this study we hypothesize that speech signal depends on multiple latent variables including the emotional state, age, gender, and speech content. We propose an Adversarial Autoencoder (AAE) to perform variational inference over the latent variables and reconstruct the input feature representations. Reconstruction of feature representations is used as
more » ... auxiliary task to aid the primary emotion recognition task. Experiments on the IEMOCAP dataset demonstrate that the auxiliary learning tasks improve emotion classification accuracy compared to a baseline supervised classifier. Further, we demonstrate that the proposed learning approach can be used for the end-to-end speech emotion recognition, as its applicable for models that operate on frame-level inputs.
doi:10.1109/icassp.2019.8682823 dblp:conf/icassp/ParthasarathyRS19 fatcat:3badcjluyvgpnfsaiv7rjw6qlu