6,157 Hits in 4.7 sec

Model-Based Synthesis of Visual Speech Movements from 3D Video

James D. Edge, Adrian Hilton, Philip Jackson
2009 EURASIP Journal on Audio, Speech, and Music Processing  
speech movements from speech audio input.  ...  In this paper we describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach.  ...  The final group of visual synthesis techniques take advantage of the audio data to map into the space of visual speech movements.  ... 
doi:10.1155/2009/597267 fatcat:4lzd4mzhdzbl7cuzmjv3upl524

Model-based synthesis of visual speech movements from 3D video

J. D. Edge, A. Hilton, P. Jackson
2009 SIGGRAPH '09: Posters on - SIGGRAPH '09  
speech movements from speech audio input.  ...  In this paper we describe a method for the synthesis of visual speech movements using a hybrid unit selection/model-based approach.  ...  The final group of visual synthesis techniques take advantage of the audio data to map into the space of visual speech movements.  ... 
doi:10.1145/1599301.1599309 dblp:conf/siggraph/EdgeHJ09 fatcat:dc2sakecwndfddycztewigybom

Visual speech synthesis from 3D video

J.D. Edge, A. Hilton
2006 3rd European Conference on Visual Media Production (CVMP 2006). Part of the 2nd Multimedia Conference 2006   unpublished
The framework allows visual speech synthesis from captured 3D video with minimal user intervention.  ...  In this paper we introduce a process for visual speech synthesis from 3D video capture to reproduce the dynamics of 3D face shape and appearance.  ...  CONCLUSIONS A data-driven approach to 3D visual speech synthesis based on captured 3D video of faces has been presented.  ... 
doi:10.1049/cp:20061940 fatcat:x5m6lmhk45b7pbaucbrbydjbbi


2007 Proceedings of the Second International Conference on Computer Graphics Theory and Applications   unpublished
A stereo capture system is used to reconstruct 3D models of a speaker producing sentences from the TIMIT corpus.  ...  It is believed that such a structure will be appropriate to various areas of speech modeling, in particular the synthesis of speech lip movements.  ...  These properties show that the speech manifold is highly structured, and potentially this structure can aid applications such as visual speech synthesis.  ... 
doi:10.5220/0002080400570062 fatcat:6kuyc4p7bnhorgx7jgvpwstcgy

Speech-driven face synthesis from 3D video

I.A. Ypsilos, A. Hilton, A. Turkmani, P.J.B. Jackson
Proceedings. 2nd International Symposium on 3D Data Processing, Visualization and Transmission, 2004. 3DPVT 2004.  
This paper presents a framework for speech-driven synthesis of real faces from a corpus of 3D video of a person speaking.  ...  Video-rate capture of dynamic 3D face shape and colour appearance provides the basis for a visual speech synthesis model.  ...  Face Synthesis from Speech In this section we present a framework for visual 3D face synthesis driven by speech.  ... 
doi:10.1109/tdpvt.2004.1335143 dblp:conf/3dpvt/YpsilosHTJ04 fatcat:oondzyrefvdovoaaegttgv7dvq

Text-To-Visual Speech in Chinese Based on Data-Driven Approach

Zhi-Ming WANG
2005 Journal of Software (Chinese)  
This paper describes a Chinese text-to-visual speech synthesis system based on data-driven (sample based) approach, which is realized by short video segments concatenation.  ...  By combining with the acoustic Text-To-Speech (TTS) synthesis, a Chinese text-to-visual speech synthesis system is realized.  ...  In visual speech analysis and synthesis, some people use 2D parameters, ignore the 3D information [21, 22] ; some people extract 3D viseme parameters by 3D face model, but constructing a 3D face model  ... 
doi:10.1360/jos161054 fatcat:xzwzseu5zndgzicbbmxp53sjrq

High quality lip-sync animation for 3D photo-realistic talking head

Lijuan Wang, Wei Han, Frank K. Soong
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In a real-time demonstration, the life-like 3D talking head can take any input text, convert it into speech and render lipsynced speech animation photo-realistically.  ...  In training, super feature vectors consisting of 3D geometry, texture and speech are augmented together to train a statistical, multi-streamed, Hidden Markov Model (HMM).  ...  [2, 3, 4, 6] show some image-based speech animation that cannot be distinguished from recorded video.  ... 
doi:10.1109/icassp.2012.6288925 dblp:conf/icassp/WangHS12 fatcat:recujpae2jefhhxaypylaviqwy

Rendering a personalized photo-real talking head from short video footage

Lijuan Wang, Wei Han, Xiaojun Qian, Frank K. Soong
2010 2010 7th International Symposium on Chinese Spoken Language Processing  
The generated trajectory is then used as a guide to select, from the original training database, an optimal sequence of lips images which are then stitched back to a background head video.  ...  For as short as 20 minutes recording of audio/video footage, the proposed system can synthesize a highly photo-real talking head in sync with the given speech signals (natural or TTS synthesized).  ...  In HMM-based visual speech synthesis, audio and video are jointly modeled in HMMs and the visual parameters are generated from HMMs by using the dynamic ("delta") constraints of the features [8] .  ... 
doi:10.1109/iscslp.2010.5684834 dblp:conf/iscslp/WangHQS10 fatcat:blnbbpgy4nakjmot2ypo76lwkm

Audiovisual Speech Synthesis using Tacotron2 [article]

Ahmed Hussen Abdelaziz, Anushree Prasanna Kumar, Chloe Seivwright, Gabriele Fanelli, Justin Binder, Yannis Stylianou, Sachin Kajarekar
2021 arXiv   pre-print
In this paper, we propose and compare two audiovisual speech synthesis systems for 3D face models.  ...  generated from professionally recorded videos.  ...  Video Synthesis Evaluation We need your help evaluating video samples from an audio-visual speech synthesis system.  ... 
arXiv:2008.00620v2 fatcat:cmww55eotffpjp6nwkl5kgmme4

Effect Of Visual Speech In Sign Speech Synthesis

Zdenek Krnoul
2009 Zenodo  
This article investigates a contribution of synthesized visual speech. Synthesis of visual speech expressed by a computer consists in an animation in particular movements of lips.  ...  Visual speech is also necessary part of the non-manual component of a sign language. Appropriate methodology is proposed to determine the quality and the accuracy of synthesized visual speech.  ...  Data for Synthesis Process Various data sources are required to create automatic 3D synthesis of sign speech (an avatar animation).  ... 
doi:10.5281/zenodo.1332124 fatcat:n2d4xioo2fha7pxbhslzqlxbwi

Acoustic-visual synthesis technique using bimodal unit-selection

Slim Ouni, Vincent Colotte, Utpala Musti, Asterios Toutios, Brigitte Wrobel-Dautcourt, Marie-Odile Berger, Caroline Lavecchia
2013 EURASIP Journal on Audio, Speech, and Music Processing  
This paper presents a bimodal acoustic-visual synthesis technique that concurrently generates the acoustic speech signal and a 3D animation of the speaker's outer face.  ...  The different synthesis steps are similar to typical concatenative speech synthesis but are generalized to the acoustic-visual domain.  ...  Obviously, to be in the same synthesis conditions, we did not use the real speaker videos, but a 3D reconstruction of the face based on the recorded data.  ... 
doi:10.1186/1687-4722-2013-16 fatcat:wx7vs77jabg5fn3xkwyplxntfq

Talking Faces: Audio-to-Video Face Generation [chapter]

Yuxin Wang, Linsen Song, Wayne Wu, Chen Qian, Ran He, Chen Change Loy
2022 Advances in Computer Vision and Pattern Recognition  
AbstractTalking face generation aims at synthesizing coherent and realistic face sequences given an input speech.  ...  LRS3 contains thousands of spoken sentences from TED and TEDx speech videos.  ...  [9] introduced a 3D blendshape model animated by 3D rotation and expression coefficients predicted only from the input speech. Karras et al.  ... 
doi:10.1007/978-3-030-87664-7_8 fatcat:5qh2bxrthrbthgjwjzlmm3je4i

On the quality of an expressive audiovisual corpus: a case study of acted speech

Slim Ouni, Sara Dahmani, Vincent Colotte
2017 The 14th International Conference on Auditory-Visual Speech Processing  
In the context of developing an expressive audiovisual speech synthesis system, the quality of the audiovisual corpus from which the 3D visual data will be extracted is important.  ...  We have observed different modalities: audio, real video, 3D-extracted data, as unimodal presentations and bimodal presentations (with audio).  ...  Since our earlier work in audiovisual speech synthesis, we consider both channels acoustic and visual together [10, 11] .  ... 
doi:10.21437/avsp.2017-11 dblp:conf/avsp/OuniDC17 fatcat:ziypnwsgsvawjfkuyr7xnpr4l4

Speech-assisted facial expression analysis and synthesis for virtual conferencing systems

Yao-Jen Chang, Chao-Kuei Heish, Pei-Wei Hsu, Yung-Chang Chen
2003 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)  
From the input speech, the mouth shape can be estimated from the audio-visual model. Thus, the large search space of mouth appearance can be reduced for mouth tracking.  ...  In this paper, the concept of speech-assisted facial expression analysis and synthesis is proposed, which shows that the speech-driven facial animation technique not only can be used for expression synthesis  ...  Visual Feature Extraction Audio-to-Visual Conversion Visual Interpretation FAP Video FAP-to-Texture Conversion Texture 3-D Synthesis Realistic Avatar Figure 2 .  ... 
doi:10.1109/icme.2003.1221365 dblp:conf/icmcs/ChangHHC03 fatcat:6tlmll7ndrfilitmka4elkbpsm

Continuous ultrasound based tongue movement video synthesis from speech

Jianrong Wang, Yalong Yang, Jianguo Wei, Ju Zhang
2016 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper, a framework to synthesize continuous ultrasound tongue movement video from speech is presented.  ...  Visualizing the movement of tongue can improve speech intelligibility and also helps learning a second language. However, hardly any research has been investigated for this topic.  ...  This paper proposes a training and synthesis framework to build the mapping from acoustic speech signals to continuous tongue movement video.  ... 
doi:10.1109/icassp.2016.7471970 dblp:conf/icassp/WangYWZ16 fatcat:5fmx4qdph5apdo7hrzm3f4uo6i
« Previous Showing results 1 — 15 out of 6,157 results