A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
FSM and k-nearest-neighbor for corpus based video-realistic audio-visual synthesis
2005
Interspeech 2005
unpublished
In this paper we introduce a corpus based 2D videorealistic audio-visual synthesis system. The system combines a concatenative Text-to-Speech (TTS) System with a concatenative Text-to-Visual (TTV) System to an audio lipmovement synchronized Text-to-Audio-Visual-Speech System (TTAVS). For the concatenative TTS we are using a Finite State Machine approach to select non-uniform variablesize audio segments. Analogue to the TTS a k-Nearest-Neighbor algorithm is applied to select the visual segments
doi:10.21437/interspeech.2005-789
fatcat:46bntxjvbfh2llqatyhlaf5rei