Disfluency Insertion for Spontaneous TTS: Formalization and Proof of Concept [chapter]

Raheel Qader, Gwénolé Lecorvé, Damien Lolive, Pascale Sébillot
2018 Lecture Notes in Computer Science  
This paper presents an exploratory work to automatically insert disfluencies in text-to-speech (TTS) systems. The objective is to make TTS more spontaneous and expressive. To achieve this, we propose to focus on the linguistic level of speech through the insertion of pauses, repetitions and revisions. We formalize the problem as a theoretical process, where transformations are iteratively composed. This is a novel contribution since most of the previous work either focus on the detection or
more » ... ning of linguistic disfluencies in speech transcripts, or solely concentrate on acoustic phenomena in TTS, especially pauses. We present a first implementation of the proposed process using conditional random fields and language models. The objective and perceptual evalation conducted on an English corpus of spontaneous speech show that our proposition is effective to generate disfluencies, and highlights perspectives for future improvements.
doi:10.1007/978-3-030-00810-9_4 fatcat:pcko3mgs3rgn3lxc7td4ocjvve