A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Towards Multi-Scale Style Control for Expressive Speech Synthesis
2021
Interspeech 2021
unpublished
This paper introduces a multi-scale speech style modeling method for end-to-end expressive speech synthesis. The proposed method employs a multi-scale reference encoder to extract both the global-scale utterance-level and the local-scale quasi-phoneme-level style features of the target speech, which are then fed into the speech synthesis model as an extension to the input phoneme sequence. During training time, the multiscale style model could be jointly trained with the speech synthesis model
doi:10.21437/interspeech.2021-947
fatcat:zqvojihaqnfdnbfgugr3ar4ucu