A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis
[article]
2018
arXiv
pre-print
Generating versatile and appropriate synthetic speech requires control over the output expression separate from the spoken text. Important non-textual speech variation is seldom annotated, in which case output control must be learned in an unsupervised fashion. In this paper, we perform an in-depth study of methods for unsupervised learning of control in statistical speech synthesis. For example, we show that popular unsupervised training heuristics can be interpreted as variational inference
arXiv:1807.11470v3
fatcat:2hqdissmirbrvoh4iu353fhzmi