Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis

Ruibo Fu, Jianhua Tao, Zhengqi Wen, Jiangyan Yi, Chunyu Qiang, Tao Wang
2020 Interspeech 2020  
Most of current end-to-end speech synthesis assumes the input text is in a single language situation. However, codeswitching in speech occurs frequently in routine life, in which speakers switch between languages in the same utterance. And building a large mixed-language speech database is difficult and uneconomical. In this paper, both windowing technique and style token modeling are designed for the code-switching endto-end speech synthesis. To improve the consistency of speaking style in
more » ... ngual situation, compared with the conventional windowing techniques that used fixed constraints, the dynamic attention reweighting soft windowing mechanism is proposed to ensure the smooth transition of code-switching. To compensate the shortage of mixed-language training data, the language dependent style token is designed for the cross-language multispeaker acoustic modeling, where both the Mandarin and English monolingual data are the extended training data set. The attention gating is proposed to adjust style token dynamically based on the language and the attended context infromation. Experimental results show that proposed methods lead to an improvement on intelligibility, naturalness and similarity.
doi:10.21437/interspeech.2020-1754 dblp:conf/interspeech/FuTWYQW20 fatcat:axfpfvlqe5e6fmfmscxzaso274