A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Temporal Attention Convolutional Network for Speech Emotion Recognition with Latent Representation
2020
Interspeech 2020
As the fundamental research of affective computing, speech emotion recognition (SER) has gained a lot of attention. Unlike with common deep learning tasks, SER was restricted by the scarcity of emotional speech datasets. In this paper, the vector quantization variational automatic encoder (VQ-VAE) was introduced and trained by massive unlabeled data in an unsupervised manner. Benefiting from the excellent invariant distribution encoding capability and discrete embedding space of VQ-VAE, the
doi:10.21437/interspeech.2020-1520
dblp:conf/interspeech/LiuLWGGD20
fatcat:eidctr5wgjd6ba75p7e4kgp6s4