Learning to Model Prosodic and Spectral Features for Non-parallel Emotive Speech Conversion

Sri Harsha Dumpala, Sageev Oore
2021 Proceedings of the Canadian Conference on Artificial Intelligence  
Emotion conversion in speech has attracted recent attention owing to its importance in human-machine interaction and the current high quality of speech synthesis. Most existing approaches rely on parallel data, which is not available in many real-time applications. We propose a non-parallel emotion conversion approach based on the cycle generative adversarial network (cycleGAN) framework. We introduce new variants of cycleGAN that use recurrent neural networks and multi-kernel convolutional
more » ... al networks for modeling prosodic features along with spectral features for emotion conversion in speech. Subjective evaluation results show the effectiveness of our approach in converting natural speech, and also unseen synthesized speech samples to different target emotive states.
doi:10.21428/594757db.930ce165 fatcat:bwji7p3cfbd2zcahhh2tcfheea