74,907 Hits in 4.9 sec

A deep learning approach for generalized speech animation

Sarah Taylor, Taehwan Kim, Yisong Yue, Moshe Mahler, James Krahe, Anastasio Garcia Rodriguez, Jessica Hodgins, Iain Matthews
2017 ACM Transactions on Graphics  
We introduce a simple and efective deep learning approach to automatically generate natural looking speech animation that synchronizes to input speech.  ...  A machine learning approach is used to learn a regression function mapping phoneme labels to speech animation.  ...  Scott Jones at Lucasilm and Hao Li at USC generously provided facial rigs. Thanks to the diverse members of Disney Research Pittsburgh who recorded foreign language speech examples.  ... 
doi:10.1145/3072959.3073699 fatcat:w42eaqtt4rbudmowb63veaofzq

End-to-end Learning for 3D Facial Animation from Raw Waveforms of Speech [article]

Hai X. Pham, Yuting Wang, Vladimir Pavlovic
2017 arXiv   pre-print
We present a deep learning framework for real-time speech-driven 3D facial animation from just raw waveforms.  ...  In particular, our deep model is able to learn the latent representations of time-varying contextual information and affective states within the speech.  ...  DEEP END-TO-END LEARNING FOR 3D FACE SYNTHESIS FROM SPEECH A.  ... 
arXiv:1710.00920v2 fatcat:bis4z3hys5dxhg2tf3hx24b7eq

Speech-Driven 3D Facial Animation with Implicit Emotional Awareness: A Deep Learning Approach

Hai X. Pham, Samuel Cheung, Vladimir Pavlovic
2017 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)  
We introduce a long short-term memory recurrent neural network (LSTM-RNN) approach for real-time facial animation, which automatically estimates head rotation and facial action unit activations of a speaker  ...  Experiments on an evaluation dataset of different speakers across a wide range of affective states demonstrate promising results of our approach in real-time speech-driven facial animation.  ...  Conclusion and Future Work This paper presents a deep recurrent learning approach for speech-driven 3D facial animation.  ... 
doi:10.1109/cvprw.2017.287 dblp:conf/cvpr/PhamCP17 fatcat:2pgkt24qjfg7vl2iciqujrexte

Facial Modelling and Animation: An Overview of The State-of-The Art

Samia Shakir, Ali Al-Azza
2021 Iraqi Journal for Electrical And Electronic Engineering  
, moving pictures experts group-4 facial animation, physics-based muscle modeling, performance driven facial animation, visual speech animation.  ...  This paper reviewed the approaches used in facial modeling and animation and described their strengths and weaknesses.  ...  For this approach it is difficult to generate a single AU respectively without touched the other AU. With the recent rise of deep learning, CNN have been widely used to extract AU features. Zhao et al  ... 
doi:10.37917/ijeee.18.1.4 fatcat:yububcsiznam3ozsazq5kn6pmi

A Translation System That Converts English Text to American Sign Language Enhanced with Deep Learning Modules

In doing so, we are able to achieve both the accuracy of a rule-based approach and the scale of a deep learning one.  ...  (NLP) and Deep Learning.  ...  The enhancement module to this phase is the paraphrase generator which uses deep learning.  ... 
doi:10.35940/ijitee.l3781.1081219 fatcat:xu7fhimjyneynczptah2t6qmiy

Deep learning approaches for neural decoding: from CNNs to LSTMs and spikes to fMRI [article]

Jesse A. Livezey, Joshua I. Glaser
2020 arXiv   pre-print
The success of deep networks in other domains has led to a new wave of applications in neuroscience. In this article, we review deep learning approaches to neural decoding.  ...  Deep learning has been shown to be a useful tool for improving the accuracy and flexibility of neural decoding across a wide range of tasks, and we point out areas for future scientific development.  ...  Acknowledgements We would like to thank Ella Batty and Charles Frye for very helpful comments on this manuscript.  ... 
arXiv:2005.09687v1 fatcat:grboww5ptvah5npbl3xeehbady

VisemeNet: Audio-Driven Animator-Centric Speech Animation [article]

Yang Zhou, Zhan Xu, Chris Landreth, Evangelos Kalogerakis, Subhransu Maji, Karan Singh
2018 arXiv   pre-print
We present a novel deep-learning based approach to producing animator-centric speech motion curves that drive a JALI or standard FACS-based production face-rig, directly from input audio.  ...  We evaluate our results by: cross-validation to ground-truth data; animator critique and edits; visual comparison to recent deep-learning lip-synchronization solutions; and showing our approach to be resilient  ...  We thank Pif Edwards and anonymous reviewers for their valuable feedback.  ... 
arXiv:1805.09488v1 fatcat:ahgmxkuawjg7hguwc37rs5lezy

Multimodal Speech Driven Facial Shape Animation Using Deep Neural Networks

Sasan Asadiabadi, Rizwan Sadiq, Engin Erzin
2018 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)  
In this paper we present a deep learning multimodal approach for speech driven generation of face animations.  ...  Training a speaker independent model, capable of generating different emotions of the speaker, is crucial for realistic animations.  ...  ACKNOWLEDGMENT We thank to NVIDIA for donating Titan XP within the GPU Grant Program.  ... 
doi:10.23919/apsipa.2018.8659713 dblp:conf/apsipa/AsadiabadiSE18 fatcat:hsbfi7jaxjcn7loyxdwelhs434

Emotion Dependent Facial Animation from Affective Speech [article]

Rizwan Sadiq, Sasan AsadiAbadi, Engin Erzin
2019 arXiv   pre-print
In this paper, we present a two-stage deep learning approach for affective speech driven facial shape animation. In the first stage, we classify affective speech into seven emotion categories.  ...  The proposed emotion dependent facial shape model performs better in terms of the Mean Squared Error (MSE) loss and in generating the landmark animations, as compared to training a universal model regardless  ...  In our earlier work, [22] , we proposed a deep multi-modal framework, combining the work of [19] using phoneme sequence with spectral speech features to generate facial animations.  ... 
arXiv:1908.03904v1 fatcat:olupfm2egreqdge66ez5jye3ay

Joint Learning of Speech-Driven Facial Motion with Bidirectional Long-Short Term Memory [chapter]

Najmeh Sadoughi, Carlos Busso
2017 Lecture Notes in Computer Science  
The face conveys a blend of verbal and nonverbal information playing an important role in daily interaction.  ...  These relationships are ignored when facial movements across the face are separately generated.  ...  Speech-Driven Models with Deep Learning Deep learning structures are very powerful to learn complex temporal relationships between modalities, hence, they are a perfect framework for speech-driven models  ... 
doi:10.1007/978-3-319-67401-8_49 fatcat:6j52e3odbfekbmr6piot2lhy54

Live Speech Portraits: Real-Time Photorealistic Talking-Head Animation [article]

Yuanxun Lu, Jinxiang Chai, Xun Cao
2021 arXiv   pre-print
The first stage is a deep neural network that extracts deep audio features along with a manifold projection to project the features to the target person's speech space.  ...  To the best of our knowledge, we first present a live system that generates personalized photorealistic talking-head animation only driven by audio signals at over 30 fps.  ...  ACKNOWLEDGMENTS We would like to thank Shuaizhen Jing for the help with the Tensorrt implementation. We are grateful to Qingqing Tian for the facial capture.  ... 
arXiv:2109.10595v2 fatcat:s35nqajynjeefcx67k42rpr7r4

Audio-to-Visual Speech Conversion Using Deep Neural Networks

Sarah Taylor, Akihiro Kato, Iain Matthews, Ben Milner
2016 Interspeech 2016  
We present a sliding window deep neural network that learns a mapping from a window of acoustic features to a window of visual features from a large audio-visual speech dataset.  ...  Overlapping visual predictions are averaged to generate continuous, smoothly varying speech animation.  ...  Sliding-Window Deep Neural Network The goal of this work is to learn a model h(x) := y that can predict a realistic facial pose for any audio speech given audio features x that encode the acoustic speech  ... 
doi:10.21437/interspeech.2016-483 dblp:conf/interspeech/TaylorKMM16 fatcat:y7nb5la3kngtrlq2vkpp7lqkay

Investigating the use of recurrent motion modelling for speech gesture generation

Ylva Ferstl, Rachel McDonnell
2018 Proceedings of the 18th International Conference on Intelligent Virtual Agents - IVA '18  
Machine learning approaches have yielded only marginal success, indicating a high complexity of the speech-to-motion learning task.  ...  In this work, we explore the use of transfer learning using previous motion modelling research to improve learning outcomes for gesture generation from speech.  ...  As an alternative approach, research has explored methods to automatically generate animation for virtual humans from speech.  ... 
doi:10.1145/3267851.3267898 dblp:conf/iva/FerstlM18 fatcat:qhbirpnbuzac3mytoqhosfz5pe

Expressive talking avatar synthesis and animation

Lei Xie, Jia Jia, Helen Meng, Zhigang Deng, Lijuan Wang
2015 Multimedia tools and applications  
Specific applications may include a virtual storyteller for children, a virtual guider or presenter for personal or commercial website, a representative of user in computer games and a funny puppetry for  ...  The talking avatar, an animated speaking virtual character with vivid human-like appearance and real or synthetic speech, has gradually shown its potential in applications involving human-computer intelligent  ...  Taking the advantage of the rich non-linear learning ability, Wu et al. [13] develop a DNN approach for real-time speech driven talking avatar.  ... 
doi:10.1007/s11042-015-2460-5 fatcat:otot2mqdcjbpzoucbbsbbj7nse

DECAR: Deep Clustering for learning general-purpose Audio Representations [article]

Sreyan Ghosh and Ashish Seth and Sandesh V Katta and S. Umesh
2022 arXiv   pre-print
In this paper, we introduce DECAR (DEep Clustering for learning general-purpose Audio Representations), a self-supervised pre-training approach for learning general-purpose audio representations.  ...  , including speech, music, animal sounds, and acoustic scenes.  ...  Common general-purpose audio representation learning approaches include [10, 11, 12, 13, 14, 15, 16] .  ... 
arXiv:2110.08895v3 fatcat:6yszijdh75bdrmjf2lntsyqlrq
« Previous Showing results 1 — 15 out of 74,907 results