A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Information Sieve: Content Leakage Reduction in End-to-End Prosody Transfer for Expressive Speech Synthesis
2021
Conference of the International Speech Communication Association
Expressive neural text-to-speech (TTS) systems incorporate a style encoder to learn a latent embedding as the style information. However, this embedding process may encode redundant textual information. This phenomenon is called content leakage. Researchers have attempted to resolve this problem by adding an ASR or other auxiliary supervision loss functions. In this study, we propose an unsupervised method called the "information sieve" to reduce the effect of content leakage in prosody
doi:10.21437/interspeech.2021-1011
dblp:conf/interspeech/DaiGWZ21
fatcat:aopxka7kwbbclbmutqjzpds4hq