A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS
2020
IEEE Signal Processing Letters
Tacotron-based end-to-end speech synthesis has shown remarkable voice quality. However, the rendering of prosody in the synthesized speech remains to be improved, especially for long sentences, where prosodic phrasing errors can occur frequently. In this paper, we extend the Tacotron-based speech synthesis framework to explicitly model the prosodic phrase breaks. We propose a multi-task learning scheme for Tacotron training, that optimizes the system to predict both Mel spectrum and phrase
doi:10.1109/lsp.2020.3016564
fatcat:q7rd6md5mnbrtpsyjpbaoiv5ou