A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Learning Speech-driven 3D Conversational Gestures from Video
[article]
2021
arXiv
pre-print
We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from ...
Synthesis of conversational body gestures is a multi-modal problem since many similar gestures can plausibly accompany the same input speech. ...
[16] propose a learning-based speech-driven generation of 2D upper body and hand gesture model from a large scale in-the-wild video collection. ...
arXiv:2102.06837v1
fatcat:fea24vvphbh4jpuqzclfviq5fe
Realistic Speech-Driven Talking Video Generation with Personalized Pose
2020
Complexity
Existing speech-driven speaker methods cannot solve this problem well. ...
Then, the final speaker video is generated by the obtained gesture key points through the video generation network. ...
Due to the weak representation ability of the 3D face model parameter network, the key point error obtained from the 3D face model conversion is larger, the 3D face model needs to be used as an intermediate ...
doi:10.1155/2020/6629634
fatcat:rvmbm73y3zdtpexqfvbtcggg54
Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics
[article]
2021
arXiv
pre-print
We propose a novel learned deep prior of body motion for 3D hand shape synthesis and estimation in the domain of conversational gestures. ...
Trained with 3D pose estimations obtained from a large-scale dataset of internet videos, our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as ...
More recently, [10] uses a GAN to learn a person-specific mapping from speech to 2D upper-body keypoints from in-the-wild videos. ...
arXiv:2007.12287v3
fatcat:vgwbijprbvenjjnura3yvohtgu
Learning Individual Styles of Conversational Gesture
2019
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Figure 1 : Speech-to-gesture translation example. In this paper, we study the connection between conversational gesture and speech. ...
We present a method for cross-modal translation from "in-the-wild" monologue speech of a single speaker to their conversational gesture motion. ...
We approach the question from a data-driven learning perspective and ask to what extent can we predict gesture motion from the raw audio signal of speech. ...
doi:10.1109/cvpr.2019.00361
dblp:conf/cvpr/GinosarBKCOM19
fatcat:pyvacuabvjfmnf5vbxnim5wgj4
Analyzing Input and Output Representations for Speech-Driven Gesture Generation
[article]
2019
arXiv
pre-print
Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. ...
Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. Our approach consists of two steps. ...
[11] designed a speech-driven neural network capable of producing 3D motion sequences. ...
arXiv:1903.03369v3
fatcat:7fqzyohqevduta26qppusskame
Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings
[article]
2020
arXiv
pre-print
Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression ...
and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently ...
ACKNOWLEDGMENTS The authors would like to acknowledge the support from the Swedish Foundation for Strategic Research, project EACare [33] ...
arXiv:2006.09888v1
fatcat:i3v7cfl4tzgmzgpuboiktr66tq
Gesticulator: A framework for semantically-aware speech-driven gesture generation
[article]
2020
arXiv
pre-print
Our deep-learning based model takes both acoustic and semantic representations of speech as input, and generates gestures as a sequence of joint angle rotations as output. ...
Current end-to-end co-speech gesture generation systems use a single modality for representing speech: either audio or text. ...
[1] proposed to model gestures based not only on the speech of the agent, but also on the speech and gestures of the interlocutor in dyadic conversation. ...
arXiv:2001.09326v3
fatcat:7e3hsuyntncftlzneyaadgvqte
Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation
[article]
2021
arXiv
pre-print
Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. ...
Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. ...
Speech-driven gesture generation Speech-driven gesture generation differs from other body-motion generation tasks in that the control signal input is computed from speech. ...
arXiv:2007.09170v3
fatcat:fb4kwh52x5hebnmldavxycquni
Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation
2020
Zenodo
Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. ...
Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. ...
Speech-driven gesture generation Speech-driven gesture generation differs from other body-motion generation tasks in that the control signal input is computed from speech. ...
doi:10.5281/zenodo.4267998
fatcat:bqfkdpfkajan3dqtsyjgfa2bqq
The Nectec Gesture Generation System entry to the GENEA Challenge 2020
[article]
2020
Zenodo
To generate the 3D motion from speech, the 40-D vectors predicted from the system described in Section 2.4 were decoded by the 'gesture decoder' part of the DAEs resulting in 45 3D-coordinates. ...
NECTEC GESTURE GENERATION SYSTEM We propose the gesture generation system, which is based on both text and speech features. This work extends from an audio-driven baseline system [5] . ...
doi:10.5281/zenodo.4088629
fatcat:2xf4vnah7bdndny4vc7j5cwhpq
Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots
[article]
2018
arXiv
pre-print
We present a learning-based co-speech gesture generation that is learned from 52 h of TED talks. ...
In a subjective evaluation, participants reported that the gestures were human-like and matched the speech content. We also demonstrate a co-speech gesture with a NAO robot working in real time. ...
Speech audio is a secondary context for gesture generation. Audio-driven gestures have been studied for artificial avatars in video games and virtual reality [13] , [14] . ...
arXiv:1810.12541v1
fatcat:qkorehsfvbay7n363sbzlsputy
VACE Multimodal Meeting Corpus
[chapter]
2006
Lecture Notes in Computer Science
With our focus on multimodality, we investigate the interaction among speech, gesture, posture, and gaze in meetings. For this purpose, a high quality multimodal corpus is being produced. ...
This research has been supported by the Advanced Research and Development Activity ARDA VACEII grant 665661: From Video to Information: Cross-Model Analysis of Planning Meetings. ...
Yingen Xiong, Ying Qiao, Bing Fang, and Dulan Wathugala from Virginia Tech, Dr. ...
doi:10.1007/11677482_4
fatcat:nj4vz67sorfi3f26n5jwvscu24
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
[article]
2022
arXiv
pre-print
To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. ...
To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations. ...
Intelligence (CPII) Ltd under the Innovation and Technology Fund, the RIE2020 Industry Alignment Fund-Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from ...
arXiv:2203.13161v1
fatcat:6rnee7ftjjefdpdfpltt34hody
A Survey on Sign Language Recognition
2021
Zenodo
Using hand gestures as the primary input, we convert those into understandable language. ...
The methods used to achieve such feats are a product of Machine Learning (ML) and Artificial Intelligence (AI) techniques. ...
The trained model is now used to detect hand gestures from the edge detected images after which suitable text and speech output is provided based on the gesture recognized. ...
doi:10.5281/zenodo.4902966
fatcat:i75iqib3mveaxcd2f6xnu36xxm
Live Speech Driven Head-and-Eye Motion Generators
2012
IEEE Transactions on Visualization and Computer Graphics
Its central idea is to learn separate yet inter-related statistical models for each component (head motion, gaze, or eyelid motion) from a pre-recorded facial motion dataset: i) Gaussian Mixture Models ...
Several user studies are conducted to evaluate the effectiveness of the proposed speech-driven head and eye motion generator using the well-established paired comparison methodology. ...
ACKNOWLEDGMENTS This work is supported in part by NSF IIS-0914965, Texas NHARP 003652-0058-2007, and research gifts from Google and Nokia. ...
doi:10.1109/tvcg.2012.74
pmid:22392712
fatcat:wejkyepl5jdndk7reycgbqhici
« Previous
Showing results 1 — 15 out of 2,981 results