Filters








2,274 Hits in 3.5 sec

Speech Drives Templates: Co-Speech Gesture Synthesis with Learned Templates [article]

Shenhan Qian, Zhi Tu, Yihao Zhi, Wen Liu, Shenghua Gao
2021 arXiv   pre-print
Co-speech gesture generation is to synthesize a gesture sequence that not only looks real but also matches with the input speech audio.  ...  Motivated by the fact that the speech cannot fully determine the gesture, we design a method that learns a set of gesture template vectors to model the latent conditions, which relieve the ambiguity.  ...  We thank UniDT (Shanghai) Co., Ltd for the assistance in collecting data of Mandarin speakers.  ... 
arXiv:2108.08020v2 fatcat:dv5edmawb5e6ja2ma562jmnkfm

Prosody modeling with soft templates

Greg Kochanski, Chilin Shih
2003 Speech Communication  
In this paper, we introduce a prosody tagging and generation system Soft TEMplate Mark-up Language (Stem-ML).  ...  This capability is crucial to the next generation of text-to-speech systems that will need to synthesize intonation variations for different speech acts, emotions, and styles of speech.  ...  This state information needs to be expressed prosodically, so one should think of speech synthesis more in the context of a concept-to-speech system than a textto-speech system.  ... 
doi:10.1016/s0167-6393(02)00047-x fatcat:pow5ucslgzblndbung45swwvcy

Lifelike Gesture Synthesis and Timing for Conversational Agents [chapter]

Ipke Wachsmuth, Stefan Kopp
2002 Lecture Notes in Computer Science  
The model is conceived to enable cross-modal synchrony with respect to the coordination of gestures with the signal generated by a text-to-speech system.  ...  Synthesis of lifelike gesture is finding growing attention in human-computer interaction.  ...  A lot of progress has been made with respect to combining speech synthesis with facial animation to bring about lip-synchronous speech, as with so-called talking heads [10] .  ... 
doi:10.1007/3-540-47873-6_13 fatcat:6on5hdhhvnfpldf3tp4zaeeaju

Speaking with hands

Matthew Stone, Doug DeCarlo, Insuk Oh, Christian Rodriguez, Adrian Stere, Alyssa Lees, Chris Bregler
2004 ACM Transactions on Graphics  
They recombine motion samples with new speech samples to recreate coherent phrases, and blend segments of speech and motion together phraseby-phrase into extended utterances.  ...  By framing problems for utterance generation and synthesis so that they can draw closely on a talented performance, our techniques support the rapid construction of animated characters with rich and appropriate  ...  Thanks to Kate Brehm for her wonderful realization of our character, to Loren Runcie and Jared Silver for help with animation, and to Electronic Arts for the use of Zoe from SSX 3.  ... 
doi:10.1145/1015706.1015753 fatcat:lbvcjr4wkraatmx74qr3frvmaq

Speaking with hands

Matthew Stone, Doug DeCarlo, Insuk Oh, Christian Rodriguez, Adrian Stere, Alyssa Lees, Chris Bregler
2004 ACM SIGGRAPH 2004 Papers on - SIGGRAPH '04  
They recombine motion samples with new speech samples to recreate coherent phrases, and blend segments of speech and motion together phraseby-phrase into extended utterances.  ...  By framing problems for utterance generation and synthesis so that they can draw closely on a talented performance, our techniques support the rapid construction of animated characters with rich and appropriate  ...  Thanks to Kate Brehm for her wonderful realization of our character, to Loren Runcie and Jared Silver for help with animation, and to Electronic Arts for the use of Zoe from SSX 3.  ... 
doi:10.1145/1186562.1015753 fatcat:ei6gkvwrurffhe7t25nz4iwbba

Freeform Body Motion Generation from Speech [article]

Jing Xu, Wei Zhang, Yalong Bai, Qibin Sun, Tao Mei
2022 arXiv   pre-print
Code and pre-trained models will be publicly available through https://github.com/TheTempAccount/Co-Speech-Motion-Generation.  ...  Motivated by studies in linguistics, we decompose the co-speech motion into two complementary parts: pose modes and rhythmic dynamics.  ...  network for speech to gesture generation. • Speech Drives Template (Tmpt) [23] learns a set of gesture templates to relieve the ambiguity of the mapping from speech to body motion. • Trimodal-Context  ... 
arXiv:2203.02291v1 fatcat:aqzd5yqpbjebzlubngoqh4gy4y

MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [article]

Alexander Richard, Michael Zollhoefer, Yandong Wen, Fernando de la Torre, Yaser Sheikh
2021 arXiv   pre-print
This paper presents a generic method for generating full facial 3D animation from speech.  ...  To improve upon existing models, we propose a generic audio-driven facial animation approach that achieves highly realistic motion synthesis results for the entire face.  ...  Psychological literature has observed that there is an important degree of dependency between speech and facial gestures.  ... 
arXiv:2104.08223v1 fatcat:wawhw2hr4rbcng4veeyxgl3p4i

Multimodal Affect Analysis for Product Feedback Assessment [article]

Amol S Patwardhan, Gerald M Knapp
2017 arXiv   pre-print
Classification is done by an emotion template mapping algorithm and training a classifier using support vector machines.  ...  multimodal affect recognition system developed to classify whether a consumer likes or dislikes a product tested at a counter or kiosk, by analyzing the consumer's facial expression, body posture, hand gestures  ...  Emotion synthesis using a 3D virtual agent was evaluated in [13] and facial, gesture and posture were used for emotion recognition.  ... 
arXiv:1705.02694v1 fatcat:aijbeawcqzhkpnawekzivehu4i

Fully generated scripted dialogue for embodied agents

Kees van Deemter, Brigitte Krenn, Paul Piwek, Martin Klesen, Marc Schröder, Stefan Baumann
2008 Artificial Intelligence  
automated construction of an abstract script for an entire dialogue (cast in terms of dialogue acts), which is incrementally enhanced by a series of modules and finally "performed" by means of text, speech  ...  > after Speech Synthesis".)  ...  The trade-offs facing language generation, speech synthesis and gesture assignment are similar.  ... 
doi:10.1016/j.artint.2008.02.002 fatcat:bpu4uu3vvbgnbowf5u35ftmhzu

Perceptive animated interfaces: first steps toward a new paradigm for human-computer interaction

R. Cole, S. Van Vuuren, B. Pellom, K. Hacioglu, Jiyong Ma, J. Movellan, S. Schwartz, D. Wade-Stein, W. Ward, Jie Yan
2003 Proceedings of the IEEE  
This paper presents a vision of the near future in which computer interaction is characterized by natural face-to-face conversations with lifelike characters that speak, emote, and gesture.  ...  In support of this hypothesis, we first describe initial experiments using an animated character to teach speech and language skills to children with hearing problems, and classroom subjects and social  ...  speech, facial expressions, and hand and body gestures in synchrony with the speech waveform.  ... 
doi:10.1109/jproc.2003.817143 fatcat:awqjxnxwkbefzcoewu7bhy7ie4

Passing a Non-verbal Turing Test: Evaluating Gesture Animations Generated from Speech [article]

Manuel Rebol and Christian Gütl and Krzysztof Pietroszek
2021 arXiv   pre-print
We create a large dataset which consists of speech and corresponding gestures in a 3D human pose format from which our model learns the speaker-specific correlation.  ...  Yet, when users are represented as avatars, it is difficult to translate non-verbal signals along with the speech into the virtual world without specialized motion-capture hardware.  ...  For the second task, the null hypothesis is that the speech fragment is not correlated with the synthesised gesture.  ... 
arXiv:2107.00712v1 fatcat:ugkw7qoqpnfn5g3rrezguwl6ui

A nonparametric regression model for virtual humans generation

Yun-Feng Chou, Zen-Chung Shih
2009 Multimedia tools and applications  
with input speech.  ...  The results show that our method effectively simulates plausible movements for character animation, including body movement simulation, novel views synthesis, and expressive facial animation synchronized  ...  Viseme synthesis with expressive face For viseme synthesis, viseme segmentation of the speech data is performed to determine all visemes and their durations.  ... 
doi:10.1007/s11042-009-0412-7 fatcat:j7pwtll45bcddif76ybvwyykm4

Vision Based Speech Animation Transferring with Underlying Anatomical Structure [chapter]

Yuru Pei, Hongbin Zha
2006 Lecture Notes in Computer Science  
Unsupervised learning is utilized on a speech video corpus to find underlying manifold of facial configurations.  ...  With parsimonious data requirements, our system realizes the animation transferring and gains a realistic rendering effect with the underlying anatomical structure.  ...  The underlying skull movement is recovered concurrent with facial animation synthesis. Then the audio track is resynchronized with 3D speech animation and played back.  ... 
doi:10.1007/11612032_60 fatcat:iyrhpyqi3rbzjegqezhul6yynu

Multimodal Interactions with Agents in Virtual Worlds [chapter]

Anton Nijholt, Joris Hulstijn
2000 Studies in Fuzziness and Soft Computing  
Feedback of the system is given using speech synthesis. We also have Karen, an information agent which allows a natural language dialogue with the user.  ...  others and can be provided with information and contacts in accordance with their preferences.  ...  Synchronizing the apparently random movements with fixed speech from templates is difficult.  ... 
doi:10.1007/978-3-7908-1856-7_8 fatcat:yt53pdbdszhrnoggyvvy4huji4

Speech-driven Animation with Meaningful Behaviors [article]

Najmeh Sadoughi, Carlos Busso
2017 arXiv   pre-print
., questions), the model learns the characteristic behaviors associated with a given discourse class learning the rules from the data.  ...  Rule-based methods focus on creating meaningful behaviors conveying the underlying message, but the gestures cannot be easily synchronized with speech.  ...  During synthesis, the models will create novel realizations of arXiv:1708.01640v1 [cs.HC] 4 Aug 2017 these gestures that are timely synchronized with speech.  ... 
arXiv:1708.01640v1 fatcat:5tarjgyon5eo5obumtipmnk77m
« Previous Showing results 1 — 15 out of 2,274 results