A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is
Temporal Attention Convolutional Network for Speech Emotion Recognition with Latent Representation
As the fundamental research of affective computing, speech emotion recognition (SER) has gained a lot of attention. Unlike with common deep learning tasks, SER was restricted by the scarcity of emotional speech datasets. In this paper, the vector quantization variational automatic encoder (VQ-VAE) was introduced and trained by massive unlabeled data in an unsupervised manner. Benefiting from the excellent invariant distribution encoding capability and discrete embedding space of VQ-VAE, thedoi:10.21437/interspeech.2020-1520 dblp:conf/interspeech/LiuLWGGD20 fatcat:eidctr5wgjd6ba75p7e4kgp6s4