Post-Processing Using Speech Enhancement Techniques for Unit Selection and Hidden Markov Model Based Low Resource Language Marathi Text-to-Speech System

Sangramsing Kayte, Monica Mundada
2018 The 6th Intl. Workshop on Spoken Language Technologies for Under-Resourced Languages  
A speech signal captured by a distant microphone is generally contaminated by background noise, which severely degrades the audible quality and intelligibility of the observed signal. To resolve this issue, speech enhancement has been intensively studied. In this paper, we consider a text-informed speech enhancement, where the enhancement process is guided by the corresponding text information, i.e. a correct transcription of the target utterance. The proposed Unit Selection Synthesis (USS) and
more » ... Hidden Markov Models (HMM)-based framework are motivated by the recent success in Text-to-Speech (TTS) research. The primary aim of the study is to improve the quality of speech after synthesizing voice employing USS and HMM methods for building low resource Marathi TTS using speech enhancement techniques. Taking advantage of the nature of USS and HMM that allows us to utilize disparate features in an inference stage, the proposed method infers the clean speech features by jointly using the observed signal and widely-used TTS features derived from the corresponding text. In this paper, we first introduce the background and the details of the proposed method for low resource Marathi language. Then, we show how the text information can be naturally integrated into speech enhancement by utilizing USS and HMM and improve the synthesis speech enhancement performance. The spectral subtraction method is used to remove the noise from synthesized speech and improve the quality. The spectral parameters of both the methods shows the progress in the enhanced speech.
doi:10.21437/sltu.2018-20 dblp:conf/sltu/KayteM18 fatcat:wvwmwm6ywjdizgfbom7xj7hbsa