Filters








2,981 Hits in 4.2 sec

Learning Speech-driven 3D Conversational Gestures from Video [article]

Ikhsanul Habibie, Weipeng Xu, Dushyant Mehta, Lingjie Liu, Hans-Peter Seidel, Gerard Pons-Moll, Mohamed Elgharib, Christian Theobalt
2021 arXiv   pre-print
We propose the first approach to automatically and jointly synthesize both the synchronous 3D conversational body and hand gestures, as well as 3D face and head animations, of a virtual character from  ...  Synthesis of conversational body gestures is a multi-modal problem since many similar gestures can plausibly accompany the same input speech.  ...  [16] propose a learning-based speech-driven generation of 2D upper body and hand gesture model from a large scale in-the-wild video collection.  ... 
arXiv:2102.06837v1 fatcat:fea24vvphbh4jpuqzclfviq5fe

Realistic Speech-Driven Talking Video Generation with Personalized Pose

Xu Zhang, Liguo Weng, Zhijie Wang
2020 Complexity  
Existing speech-driven speaker methods cannot solve this problem well.  ...  Then, the final speaker video is generated by the obtained gesture key points through the video generation network.  ...  Due to the weak representation ability of the 3D face model parameter network, the key point error obtained from the 3D face model conversion is larger, the 3D face model needs to be used as an intermediate  ... 
doi:10.1155/2020/6629634 fatcat:rvmbm73y3zdtpexqfvbtcggg54

Body2Hands: Learning to Infer 3D Hands from Conversational Gesture Body Dynamics [article]

Evonne Ng, Shiry Ginosar, Trevor Darrell, Hanbyul Joo
2021 arXiv   pre-print
We propose a novel learned deep prior of body motion for 3D hand shape synthesis and estimation in the domain of conversational gestures.  ...  Trained with 3D pose estimations obtained from a large-scale dataset of internet videos, our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as  ...  More recently, [10] uses a GAN to learn a person-specific mapping from speech to 2D upper-body keypoints from in-the-wild videos.  ... 
arXiv:2007.12287v3 fatcat:vgwbijprbvenjjnura3yvohtgu

Learning Individual Styles of Conversational Gesture

Shiry Ginosar, Amir Bar, Gefen Kohavi, Caroline Chan, Andrew Owens, Jitendra Malik
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Figure 1 : Speech-to-gesture translation example. In this paper, we study the connection between conversational gesture and speech.  ...  We present a method for cross-modal translation from "in-the-wild" monologue speech of a single speaker to their conversational gesture motion.  ...  We approach the question from a data-driven learning perspective and ask to what extent can we predict gesture motion from the raw audio signal of speech.  ... 
doi:10.1109/cvpr.2019.00361 dblp:conf/cvpr/GinosarBKCOM19 fatcat:pyvacuabvjfmnf5vbxnim5wgj4

Analyzing Input and Output Representations for Speech-Driven Gesture Generation [article]

Taras Kucherenko, Dai Hasegawa, Gustav Eje Henter, Naoshi Kaneko, Hedvig Kjellström
2019 arXiv   pre-print
Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning.  ...  Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. Our approach consists of two steps.  ...  [11] designed a speech-driven neural network capable of producing 3D motion sequences.  ... 
arXiv:1903.03369v3 fatcat:7fqzyohqevduta26qppusskame

Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings [article]

Patrik Jonell, Taras Kucherenko, Gustav Eje Henter, Jonas Beskow
2020 arXiv   pre-print
Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression  ...  and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently  ...  ACKNOWLEDGMENTS The authors would like to acknowledge the support from the Swedish Foundation for Strategic Research, project EACare [33]  ... 
arXiv:2006.09888v1 fatcat:i3v7cfl4tzgmzgpuboiktr66tq

Gesticulator: A framework for semantically-aware speech-driven gesture generation [article]

Taras Kucherenko, Patrik Jonell, Sanne van Waveren, Gustav Eje Henter, Simon Alexanderson, Iolanda Leite, Hedvig Kjellström
2020 arXiv   pre-print
Our deep-learning based model takes both acoustic and semantic representations of speech as input, and generates gestures as a sequence of joint angle rotations as output.  ...  Current end-to-end co-speech gesture generation systems use a single modality for representing speech: either audio or text.  ...  [1] proposed to model gestures based not only on the speech of the agent, but also on the speech and gestures of the interlocutor in dyadic conversation.  ... 
arXiv:2001.09326v3 fatcat:7e3hsuyntncftlzneyaadgvqte

Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation [article]

Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, Hedvig Kjellström
2021 arXiv   pre-print
Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning.  ...  Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates.  ...  Speech-driven gesture generation Speech-driven gesture generation differs from other body-motion generation tasks in that the control signal input is computed from speech.  ... 
arXiv:2007.09170v3 fatcat:fb4kwh52x5hebnmldavxycquni

Moving fast and slow: Analysis of representations and post-processing in speech-driven automatic gesture generation

Taras Kucherenko, Dai Hasegawa, Naoshi Kaneko, Gustav Eje Henter, Hedvig Kjellström
2020 Zenodo  
Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning.  ...  Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates.  ...  Speech-driven gesture generation Speech-driven gesture generation differs from other body-motion generation tasks in that the control signal input is computed from speech.  ... 
doi:10.5281/zenodo.4267998 fatcat:bqfkdpfkajan3dqtsyjgfa2bqq

The Nectec Gesture Generation System entry to the GENEA Challenge 2020 [article]

Ausdang Thangthai, Kwanchiva Thangthai, Arnon Namsanit, Sumonmas Thatphithakkul, Sittipong Saychum
2020 Zenodo  
To generate the 3D motion from speech, the 40-D vectors predicted from the system described in Section 2.4 were decoded by the 'gesture decoder' part of the DAEs resulting in 45 3D-coordinates.  ...  NECTEC GESTURE GENERATION SYSTEM We propose the gesture generation system, which is based on both text and speech features. This work extends from an audio-driven baseline system [5] .  ... 
doi:10.5281/zenodo.4088629 fatcat:2xf4vnah7bdndny4vc7j5cwhpq

Robots Learn Social Skills: End-to-End Learning of Co-Speech Gesture Generation for Humanoid Robots [article]

Youngwoo Yoon, Woo-Ri Ko, Minsu Jang, Jaeyeon Lee, Jaehong Kim, Geehyuk Lee
2018 arXiv   pre-print
We present a learning-based co-speech gesture generation that is learned from 52 h of TED talks.  ...  In a subjective evaluation, participants reported that the gestures were human-like and matched the speech content. We also demonstrate a co-speech gesture with a NAO robot working in real time.  ...  Speech audio is a secondary context for gesture generation. Audio-driven gestures have been studied for artificial avatars in video games and virtual reality [13] , [14] .  ... 
arXiv:1810.12541v1 fatcat:qkorehsfvbay7n363sbzlsputy

VACE Multimodal Meeting Corpus [chapter]

Lei Chen, R. Travis Rose, Ying Qiao, Irene Kimbara, Fey Parrill, Haleema Welji, Tony Xu Han, Jilin Tu, Zhongqiang Huang, Mary Harper, Francis Quek, Yingen Xiong (+3 others)
2006 Lecture Notes in Computer Science  
With our focus on multimodality, we investigate the interaction among speech, gesture, posture, and gaze in meetings. For this purpose, a high quality multimodal corpus is being produced.  ...  This research has been supported by the Advanced Research and Development Activity ARDA VACEII grant 665661: From Video to Information: Cross-Model Analysis of Planning Meetings.  ...  Yingen Xiong, Ying Qiao, Bing Fang, and Dulan Wathugala from Virginia Tech, Dr.  ... 
doi:10.1007/11677482_4 fatcat:nj4vz67sorfi3f26n5jwvscu24

Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation [article]

Xian Liu, Qianyi Wu, Hang Zhou, Yinghao Xu, Rui Qian, Xinyi Lin, Xiaowei Zhou, Wayne Wu, Bo Dai, Bolei Zhou
2022 arXiv   pre-print
To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation.  ...  To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations.  ...  Intelligence (CPII) Ltd under the Innovation and Technology Fund, the RIE2020 Industry Alignment Fund-Industry Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from  ... 
arXiv:2203.13161v1 fatcat:6rnee7ftjjefdpdfpltt34hody

A Survey on Sign Language Recognition

Abilash S, Ashish Ashish, Shreyas K S, Shreyas S. Nadig, Vallabh Mahale
2021 Zenodo  
Using hand gestures as the primary input, we convert those into understandable language.  ...  The methods used to achieve such feats are a product of Machine Learning (ML) and Artificial Intelligence (AI) techniques.  ...  The trained model is now used to detect hand gestures from the edge detected images after which suitable text and speech output is provided based on the gesture recognized.  ... 
doi:10.5281/zenodo.4902966 fatcat:i75iqib3mveaxcd2f6xnu36xxm

Live Speech Driven Head-and-Eye Motion Generators

B. H. Le, Xiaohan Ma, Zhigang Deng
2012 IEEE Transactions on Visualization and Computer Graphics  
Its central idea is to learn separate yet inter-related statistical models for each component (head motion, gaze, or eyelid motion) from a pre-recorded facial motion dataset: i) Gaussian Mixture Models  ...  Several user studies are conducted to evaluate the effectiveness of the proposed speech-driven head and eye motion generator using the well-established paired comparison methodology.  ...  ACKNOWLEDGMENTS This work is supported in part by NSF IIS-0914965, Texas NHARP 003652-0058-2007, and research gifts from Google and Nokia.  ... 
doi:10.1109/tvcg.2012.74 pmid:22392712 fatcat:wejkyepl5jdndk7reycgbqhici
« Previous Showing results 1 — 15 out of 2,981 results