Resynthesizing the GECO speech corpus with VocalTractLab

Konstantin Sering, Niels Stehwien, Yingming Gao, Martin V. Butz, R. Harald Baayen
2019 Zenodo  
We are addressing the challenge of learning an inverse mapping between acoustic features and control parameters of a vocal tract simulator. As a first step, we synthesize an articulatory corpus consisting of control parameters and wave forms using VocalTractLab (VTL; [1]) as the vocal tract simulator. The basis for the synthesis is a concatenative approach that combines gestures of VTL according to a SAMPA transcription. SAMPA transcriptions are taken from the GECO corpus [2], a spontaneous
more » ... , a spontaneous speech corpus of southern German. The presented approach uses the duration of the phones and extracted pitch contours to create gesture files for the VTL. The resynthesis of the GECO corpus results in 53960 valid spliced out word samples totalling in 6 hours and 23 minutes of synthesized speech. The synthesis quality is mediocre. We believe that the synthesized samples resemble some of the natural variability found in natural human speech.
doi:10.5281/zenodo.4115331 fatcat:vj566p6e3vd4bjvspfv5gewole