Wavelet analysis used in text-to-speech synthesis

M. Kobayashi, M. Sakamoto, T. Saito, Y. Hashimoto, M. Nishimura, K. Suzuki
1998 IEEE transactions on circuits and systems - 2, Analog and digital signal processing  
This brief describes the use of wavelet analysis in the development of a Japanese text-to-speech (TTS) system for personal computers. The quality of synthesized speech is one of the most important features of any TTS system. Synthesis methods which are based on manipulation of the speech signal spectrum (e.g., linear predictive coding synthesis and formant synthesis) produce comprehensible but unnatural sounding output. The lack of naturalness commonly associated with these methods results from
more » ... the use of oversimplified speech models, small synthesis unit inventories, and poor handling of text parsing for prosody control. We developed four new technologies to overcome these difficulties and improve the quality of output from TTS systems: accurate pitch mark determination by wavelet analysis, speech waveform generation using a modified time domain pitch synchronous overlap-add method, speech synthesis unit selection using a context dependent clustering method, and efficient prosody control using a 3-phrase parser. All four technologies will be described; however, those which rely on wavelet techniques will be emphasized.
doi:10.1109/82.718823 fatcat:bd2vi2evczhjnib3h2x5xtc24q