GAN-Based Data Generation for Speech Emotion Recognition

Sefik Emre Eskimez, Dimitrios Dimitriadis, Robert Gmyr, Kenichi Kumanati
2020 Interspeech 2020  
In this work, we propose a GAN-based method to generate synthetic data for speech emotion recognition. Specifically, we investigate the usage of GANs for capturing the data manifold when the data is eyes-off, i.e., where we can train networks using the data but cannot copy it from the clients. We propose a CNN-based GAN with spectral normalization on both the generator and discriminator, both of which are pre-trained on large unlabeled speech corpora. We show that our method provides better
more » ... ch emotion recognition performance than a strong baseline. Furthermore, we show that even after the data on the client is lost, our model can generate similar data that can be used for model bootstrapping in the future. Although we evaluated our method for speech emotion recognition, it can be applied to other tasks. Index Terms: speech emotion recognition, generative adversarial networks, data augmentation In this section, we provide more detail on the neural network architecture, class-conditioning, pre-training, and fine-tuning.
doi:10.21437/interspeech.2020-2898 dblp:conf/interspeech/EskimezDGK20 fatcat:nsmqvjiwenewzdj6kwsavn5eo4