Speaker dependent expression predictor from text: Expressiveness and transplantation

Langzhou Chen, Norbert Braunschweiler, Mark J.F. Gales
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Automatically generating expressive speech from plain text is an important research topic in speech synthesis. Given the same text, different speakers may interpret it and read it in very different ways. This implies that expression prediction from text is a speaker dependent task. Previous work presented an integrated method for expression prediction and speech synthesis which can be used to model the diverse expressions in human's speech and build speaker dependent expression predictors from
more » ... on predictors from text. This work extends the integrated method for expression prediction and speech synthesis into a framework for speaker and expression factorization. The expressions generated by the speaker dependent expression predictors can be represented in a shared expression space, and in this space the expressions can be transplanted between different speakers. The experimental results indicate that based on the proposed method, the expressiveness of the synthetic speech can be improved for different speakers. Furthermore this work also shows how important the speaker specific information is for the performance of the expression predictor from text. Index Termsexpressive speech synthesis, hidden Markov model, cluster adaptive training, factorization, neural network
doi:10.1109/icassp.2014.6854065 dblp:conf/icassp/ChenBG14 fatcat:dvxjk5qqsvcypbzngqpghm44um