Constructing a Deep Neural Network Based Spectral Model for Statistical Speech Synthesis [chapter]

Shinji Takaki, Junichi Yamagishi
2016 Smart Innovation, Systems and Technologies  
This paper presents a technique for spectral modeling using a deep neural network (DNN) for statistical parametric speech synthesis. In statistical parametric speech synthesis systems, spectrum is generally represented by low-dimensional spectral envelope parameters such as cepstrum and LSP, and the parameters are statistically modeled using hidden Markov models (HMMs) or DNNs. In this paper, we propose a statistical parametric speech synthesis system that models highdimensional spectral
more » ... des directly using the DNN framework to improve modelling of spectral fine structures. We combine two DNNs, i.e. one for data-driven feature extraction from the spectral amplitudes pre-trained using an auto-encoder and another for acoustic modeling into a large network and optimize the networks together to construct a single DNN that directly synthesizes spectral amplitude information from linguistic features. Experimental results show that the proposed technique increases the quality of synthetic speech. Shinji Takaki was supported in part by NAVER Labs. The research leading to these results was partly funded by EP/J002526/1 (CAF).
doi:10.1007/978-3-319-28109-4_12 fatcat:oy2a773e5begpg7bmimipkyav4