A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
In this paper we introduce a corpus based 2D videorealistic audio-visual synthesis system. The system combines a concatenative Text-to-Speech (TTS) System with a concatenative Text-to-Visual (TTV) System to an audio lipmovement synchronized Text-to-Audio-Visual-Speech System (TTAVS). For the concatenative TTS we are using a Finite State Machine approach to select non-uniform variablesize audio segments. Analogue to the TTS a k-Nearest-Neighbor algorithm is applied to select the visual segmentsdoi:10.21437/interspeech.2005-789 fatcat:46bntxjvbfh2llqatyhlaf5rei