A Speaker-Dependent WaveNet for Voice Conversion with Non-Parallel Data

Xiaohai Tian, Eng Siong Chng, Haizhou Li
2019 Interspeech 2019  
In a typical voice conversion system, vocoder is commonly used for speech-to-features analysis and features-to-speech synthesis. However, vocoder can be a source of speech quality degradation. This paper presents a novel approach to voice conversion using WaveNet for non-parallel training data. Instead of reconstructing speech with intermediate features, the proposed approach utilizes the WaveNet to map the Phonetic Posterior-Grams (PPGs) to the waveform samples directly. In this way, we avoid
more » ... he estimation errors arising from vocoding and feature conversion. Additionally, as PPG is assumed to be speaker independent, the proposed approach also reduces the feature mismatch problem in WaveNet vocoder based solutions. Experimental results conducted on the CMU-ARCTIC database show that the proposed approach significantly outperforms the traditional vocoder and WaveNet Vocoder baselines in terms of speech quality.
doi:10.21437/interspeech.2019-1514 dblp:conf/interspeech/TianC019 fatcat:ebmoa63td5gbzlruatksmamnqq