Development and Evaluation of Speech Synthesis Corpora for Latvian

Roberts Dargis, Peteris Paikens, Normunds Gruzitis, Ilze Auzina, Agate Akmane
2020 International Conference on Language Resources and Evaluation  
Text to speech (TTS) systems are necessary for any language to ensure accessibility and availability of digital language services. Recent advances in neural speech synthesis have enabled the development of such systems with a data-driven approach that does not require significant development of language-specific tools. However, smaller languages often lack speech corpora that would be sufficient for training current neural TTS models, which require at least 30 hours of good quality audio
more » ... ngs from a single speaker in a noiseless environment with matching transcriptions. Making such a corpus manually can be cost prohibitive. This paper presents an unsupervised approach to obtain a suitable corpus from unannotated recordings using automated speech recognition for transcription, as well as automated speaker segmentation and identification. The proposed method and software tools are applied and evaluated on a case study for developing a corpus suitable for Latvian speech synthesis based on Latvian public radio archive data.
dblp:conf/lrec/DargisPGAA20 fatcat:aqx2smzvcjatlcn5p2lfgntkxq