A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Model-Based Synthesis of Visual Speech Movements from 3D Video
2009
EURASIP Journal on Audio, Speech, and Music Processing
speech movements from speech audio input. ...
In this paper we describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. ...
The final group of visual synthesis techniques take advantage of the audio data to map into the space of visual speech movements. ...
doi:10.1155/2009/597267
fatcat:4lzd4mzhdzbl7cuzmjv3upl524
Model-based synthesis of visual speech movements from 3D video
2009
SIGGRAPH '09: Posters on - SIGGRAPH '09
speech movements from speech audio input. ...
In this paper we describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach. ...
The final group of visual synthesis techniques take advantage of the audio data to map into the space of visual speech movements. ...
doi:10.1145/1599301.1599309
dblp:conf/siggraph/EdgeHJ09
fatcat:dc2sakecwndfddycztewigybom
Visual speech synthesis from 3D video
2006
3rd European Conference on Visual Media Production (CVMP 2006). Part of the 2nd Multimedia Conference 2006
unpublished
The framework allows visual speech synthesis from captured 3D video with minimal user intervention. ...
In this paper we introduce a process for visual speech synthesis from 3D video capture to reproduce the dynamics of 3D face shape and appearance. ...
CONCLUSIONS A data-driven approach to 3D visual speech synthesis based on captured 3D video of faces has been presented. ...
doi:10.1049/cp:20061940
fatcat:x5m6lmhk45b7pbaucbrbydjbbi
VISUAL SPEECH SYNTHESIS FROM 3D VIDEO
english
2007
Proceedings of the Second International Conference on Computer Graphics Theory and Applications
unpublished
english
A stereo capture system is used to reconstruct 3D models of a speaker producing sentences from the TIMIT corpus. ...
It is believed that such a structure will be appropriate to various areas of speech modeling, in particular the synthesis of speech lip movements. ...
These properties show that the speech manifold is highly structured, and potentially this structure can aid applications such as visual speech synthesis. ...
doi:10.5220/0002080400570062
fatcat:6kuyc4p7bnhorgx7jgvpwstcgy
Speech-driven face synthesis from 3D video
Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004.
This paper presents a framework for speech-driven synthesis of real faces from a corpus of 3D video of a person speaking. ...
Video-rate capture of dynamic 3D face shape and colour appearance provides the basis for a visual speech synthesis model. ...
Face Synthesis from Speech In this section we present a framework for visual 3D face synthesis driven by speech. ...
doi:10.1109/tdpvt.2004.1335143
dblp:conf/3dpvt/YpsilosHTJ04
fatcat:oondzyrefvdovoaaegttgv7dvq
Text-To-Visual Speech in Chinese Based on Data-Driven Approach
2005
Journal of Software (Chinese)
This paper describes a Chinese text-to-visual speech synthesis system based on data-driven (sample based) approach, which is realized by short video segments concatenation. ...
By combining with the acoustic Text-To-Speech (TTS) synthesis, a Chinese text-to-visual speech synthesis system is realized. ...
In visual speech analysis and synthesis, some people use 2D parameters, ignore the 3D information [21, 22] ; some people extract 3D viseme parameters by 3D face model, but constructing a 3D face model ...
doi:10.1360/jos161054
fatcat:xzwzseu5zndgzicbbmxp53sjrq
High quality lip-sync animation for 3D photo-realistic talking head
2012
2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
In a real-time demonstration, the life-like 3D talking head can take any input text, convert it into speech and render lipsynced speech animation photo-realistically. ...
In training, super feature vectors consisting of 3D geometry, texture and speech are augmented together to train a statistical, multi-streamed, Hidden Markov Model (HMM). ...
[2, 3, 4, 6] show some image-based speech animation that cannot be distinguished from recorded video. ...
doi:10.1109/icassp.2012.6288925
dblp:conf/icassp/WangHS12
fatcat:recujpae2jefhhxaypylaviqwy
Rendering a personalized photo-real talking head from short video footage
2010
2010 7th International Symposium on Chinese Spoken Language Processing
The generated trajectory is then used as a guide to select, from the original training database, an optimal sequence of lips images which are then stitched back to a background head video. ...
For as short as 20 minutes recording of audio/video footage, the proposed system can synthesize a highly photo-real talking head in sync with the given speech signals (natural or TTS synthesized). ...
In HMM-based visual speech synthesis, audio and video are jointly modeled in HMMs and the visual parameters are generated from HMMs by using the dynamic ("delta") constraints of the features [8] . ...
doi:10.1109/iscslp.2010.5684834
dblp:conf/iscslp/WangHQS10
fatcat:blnbbpgy4nakjmot2ypo76lwkm
Audiovisual Speech Synthesis using Tacotron2
[article]
2021
arXiv
pre-print
In this paper, we propose and compare two audiovisual speech synthesis systems for 3D face models. ...
generated from professionally recorded videos. ...
Video Synthesis Evaluation We need your help evaluating video samples from an audio-visual speech synthesis system. ...
arXiv:2008.00620v2
fatcat:cmww55eotffpjp6nwkl5kgmme4
Effect Of Visual Speech In Sign Speech Synthesis
2009
Zenodo
This article investigates a contribution of synthesized visual speech. Synthesis of visual speech expressed by a computer consists in an animation in particular movements of lips. ...
Visual speech is also necessary part of the non-manual component of a sign language. Appropriate methodology is proposed to determine the quality and the accuracy of synthesized visual speech. ...
Data for Synthesis Process Various data sources are required to create automatic 3D synthesis of sign speech (an avatar animation). ...
doi:10.5281/zenodo.1332124
fatcat:n2d4xioo2fha7pxbhslzqlxbwi
Acoustic-visual synthesis technique using bimodal unit-selection
2013
EURASIP Journal on Audio, Speech, and Music Processing
This paper presents a bimodal acoustic-visual synthesis technique that concurrently generates the acoustic speech signal and a 3D animation of the speaker's outer face. ...
The different synthesis steps are similar to typical concatenative speech synthesis but are generalized to the acoustic-visual domain. ...
Obviously, to be in the same synthesis conditions, we did not use the real speaker videos, but a 3D reconstruction of the face based on the recorded data. ...
doi:10.1186/1687-4722-2013-16
fatcat:wx7vs77jabg5fn3xkwyplxntfq
Talking Faces: Audio-to-Video Face Generation
[chapter]
2022
Advances in Computer Vision and Pattern Recognition
AbstractTalking face generation aims at synthesizing coherent and realistic face sequences given an input speech. ...
LRS3 contains thousands of spoken sentences from TED and TEDx speech videos. ...
[9] introduced a 3D blendshape model animated by 3D rotation and expression coefficients predicted only from the input speech. Karras et al. ...
doi:10.1007/978-3-030-87664-7_8
fatcat:5qh2bxrthrbthgjwjzlmm3je4i
On the quality of an expressive audiovisual corpus: a case study of acted speech
2017
The 14th International Conference on Auditory-Visual Speech Processing
In the context of developing an expressive audiovisual speech synthesis system, the quality of the audiovisual corpus from which the 3D visual data will be extracted is important. ...
We have observed different modalities: audio, real video, 3D-extracted data, as unimodal presentations and bimodal presentations (with audio). ...
Since our earlier work in audiovisual speech synthesis, we consider both channels acoustic and visual together [10, 11] . ...
doi:10.21437/avsp.2017-11
dblp:conf/avsp/OuniDC17
fatcat:ziypnwsgsvawjfkuyr7xnpr4l4
Speech-assisted facial expression analysis and synthesis for virtual conferencing systems
2003
2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)
From the input speech, the mouth shape can be estimated from the audio-visual model. Thus, the large search space of mouth appearance can be reduced for mouth tracking. ...
In this paper, the concept of speech-assisted facial expression analysis and synthesis is proposed, which shows that the speech-driven facial animation technique not only can be used for expression synthesis ...
Visual Feature Extraction
Audio-to-Visual Conversion Visual Interpretation
FAP Video
FAP-to-Texture Conversion
Texture
3-D Synthesis Realistic Avatar Figure 2 . ...
doi:10.1109/icme.2003.1221365
dblp:conf/icmcs/ChangHHC03
fatcat:6tlmll7ndrfilitmka4elkbpsm
Continuous ultrasound based tongue movement video synthesis from speech
2016
2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
In this paper, a framework to synthesize continuous ultrasound tongue movement video from speech is presented. ...
Visualizing the movement of tongue can improve speech intelligibility and also helps learning a second language. However, hardly any research has been investigated for this topic. ...
This paper proposes a training and synthesis framework to build the mapping from acoustic speech signals to continuous tongue movement video. ...
doi:10.1109/icassp.2016.7471970
dblp:conf/icassp/WangYWZ16
fatcat:5fmx4qdph5apdo7hrzm3f4uo6i
« Previous
Showing results 1 — 15 out of 6,157 results