A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Multi-Modal Speech Emotion Recognition Using Speech Embeddings and Audio Features
2019
The 15th International Conference on Auditory-Visual Speech Processing
unpublished
In this work, we propose a multi-modal emotion recognition model to improve the speech emotion recognition system performance. We use two parallel Bidirectional LSTM networks called acoustic encoder (ENC1) and speech embedding encoder (ENC2). The acoustic encoder is a Bi-LSTM which takes sequence of speech features as inputs and speech embedding encoder is also a Bi-LSTM which takes sequence of speech embeddings as input and the output hidden representation at the last time step of both the
doi:10.21437/avsp.2019-4
fatcat:stt2b6lvc5e4lhfo5cr6ksheyu