A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Speech Gesture Generation from the Trimodal Context of Text, Audio, and Speaker Identity
[article]
2020
pre-print
In this paper, we present an automatic gesture generation model that uses the multimodal context of speech text, audio, and speaker identity to reliably generate gestures. ...
All the code and data is available at https://github.com/ai4r/Gesture-Generation-from-Trimodal-Context. ...
Resource supporting this work were provided by the 'Ministry of Science and ICT' and NIPA ("HPC Support" Project). ...
doi:10.1145/3414685.3417838
arXiv:2009.02119v1
fatcat:7xm4shpylbfz7oowsor46nsld4
Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation
[article]
2022
arXiv
pre-print
To fully utilize the rich connections between speech audio and human gestures, we propose a novel framework named Hierarchical Audio-to-Gesture (HA2G) for co-speech gesture generation. ...
To enhance the quality of synthesized gestures, we develop a contrastive learning strategy based on audio-text alignment for better audio representations. ...
Collaboration Projects (IAF-ICP) Funding Initiative, as well as cash and in-kind contribution from the industry partner(s). ...
arXiv:2203.13161v1
fatcat:6rnee7ftjjefdpdfpltt34hody
Freeform Body Motion Generation from Speech
[article]
2022
arXiv
pre-print
Body motion generation from speech is inherently difficult due to the non-deterministic mapping from speech to body motions. ...
Extensive experiments demonstrate the superior performance against several baselines, in terms of motion diversity, quality and syncing with speech. ...
network for speech to gesture generation. • Speech Drives Template (Tmpt) [23] learns a set of gesture templates to relieve the ambiguity of the mapping from speech to body motion. • Trimodal-Context ...
arXiv:2203.02291v1
fatcat:aqzd5yqpbjebzlubngoqh4gy4y
Speech2AffectiveGestures: Synthesizing Co-Speech Gestures with Generative Adversarial Affective Expression Learning
[article]
2021
arXiv
pre-print
Our network consists of two components: a generator to synthesize gestures from a joint embedding space of features encoded from the input speech and the seed poses, and a discriminator to distinguish ...
We leverage the Mel-frequency cepstral coefficients and the text transcript computed from the input speech in separate encoders in our generator to learn the desired sentiments and the associated affective ...
Acknowledgment This work has been supported in part by ARO Grants W911NF1910069 and W911NF1910315, and Intel. ...
arXiv:2108.00262v1
fatcat:qkdnpmnldze6pjnnxlnnahix4e
Joint Audio-Text Model for Expressive Speech-Driven 3D Facial Animation
[article]
2021
arXiv
pre-print
The existing datasets are collected to cover as many different phonemes as possible instead of sentences, thus limiting the capability of the audio-based model to learn more diverse contexts. ...
In contrast to prior approaches which learn phoneme-level features from the text, we investigate the high-level contextual text features for speech-driven 3D facial animation. ...
.; and
Battenberg, E.; and Nieto, O. 2015. librosa: Audio and mu- Lee, G. 2020. Speech gesture generation from the trimodal
sic signal analysis in python. ...
arXiv:2112.02214v2
fatcat:77tyq4cslfatrghj7aypwnmnuy
Multimodal Sentiment Analysis: Addressing Key Issues and Setting up the Baselines
[article]
2019
arXiv
pre-print
We also discuss some major issues, frequently ignored in multimodal sentiment analysis research, e.g., role of speaker-exclusive models, importance of different modalities, and generalizability. ...
This framework illustrates the different facets of analysis to be considered while performing multimodal sentiment analysis and, hence, serves as a new benchmark for future research in this emerging field ...
[15] fused information from audio, visual and text modalities to extract emotion and sentiment. Metallinou et al. [9] fused audio and text modalities for emotion recognition. ...
arXiv:1803.07427v2
fatcat:jytchjl3gnbpjkyvp4kb3ih5tu
Deep Multimodal Emotion Recognition on Human Speech: A Review
2021
Applied Sciences
This work reviews the state of the art in multimodal speech emotion recognition methodologies, focusing on audio, text and visual information. ...
, although in one of the unimodal or multimodal interactions; and (iii) temporal architectures (TA), which try to capture both unimodal and cross-modal temporal dependencies. ...
The authors propose a deep architecture for the problem of speech emotion recognition, and thus they consider the two modalities of audio and text. ...
doi:10.3390/app11177962
fatcat:cezjfmjmvbgapo3tdz5j3iecp4
A review of affective computing: From unimodal analysis to multimodal fusion
2017
Information Fusion
Multimodality is defined by the presence of more than one modality or channel, e.g., visual, audio, text, gestures, and eye gage. ...
In this paper, we focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90% of the relevant literature appears to cover these three modalities. ...
] aimed to integrate information from facial expressions, body movement, gestures and speech, for recognition of eight basic emotions. ...
doi:10.1016/j.inffus.2017.02.003
fatcat:ytebhjxlz5bvxcdghg4wxbvr6a
Self-reference in early speech of children speaking Slovak
2018
Journal of Language and Cultural Education
A child's speech is researched from the very first occurrence of a self-reference mean in 16th month up to the upper limit of early age (36th month) and all that is based on audio-visual records transcripts ...
The study focuses on the process of being aware of own I in children acquiring Slovak language at an early age and living in a Slovak family. ...
Acknowledgment This work was supported by the project VEGA 1/0099/16 Personal and Social Deixis in Slovak Language. ...
doi:10.2478/jolace-2018-0013
fatcat:o47z3fios5b4vky4stcrllzywu
Speech technology for unwritten languages
2020
IEEE/ACM Transactions on Audio Speech and Language Processing
The results suggest that building systems that go directly from speech-to-meaning and from meaning-tospeech, bypassing the need for text, is possible. ...
In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speechto-text and text-to-speech ...
The authors would like to thank Sanjeev Khudanpur and the rest of the Johns Hopkins University team and the local team at Carnegie Mellon University for organizing the JSALT workshop. ...
doi:10.1109/taslp.2020.2973896
fatcat:mjhxfnrnq5g73jis6stemoogem
M2R2: Missing-Modality Robust emotion Recognition framework with iterative data augmentation
[article]
2022
arXiv
pre-print
Present models generally predict the speaker's emotions by its current utterance and context, which is degraded by modality missing considerably. ...
Firstly, a network called Party Attentive Network (PANet) is designed to classify emotions, which tracks all the speakers' states and context. ...
More structures and techniques with suitable common representation learning methods should be tested, which we plan to explore in the future. ...
arXiv:2205.02524v1
fatcat:vh624wdr3bdjfeqogzv5yh2wri
Crossmodal Audio and Tactile Interaction with Mobile Touchscreens
2010
International Journal of Mobile Human Computer Interaction
The final study involved a longitudinal evaluation of a touchscreen application, CrossTrainer, focusing on longitudinal effects on performance with audio and tactile feedback, the impact of context on ...
Experiments showed that keyboards with audio or tactile feedback produce fewer errors and greater speeds of text entry compared to standard touchscreen keyboards. ...
, bimodal or trimodal conditions containing audio. ...
doi:10.4018/jmhci.2010100102
fatcat:wbntzbzojbhejgpkcr3jj7fjhy
The Acoustic and Auditory Contexts of Human Behavior
2015
Current Anthropology
We propose that a framework requires to be developed in which inferences can be made about the significance of sound in the past that are not bounded by the particularities of current cultural contexts ...
Such a framework should be multidisciplinary and draw on what is known scientifically about human sensitivities to and uses of sound, including nonverbal vocalizations, speech and music, ethological studies ...
Acknowledgments We would like to thank the two anonymous reviewers and Professor Steven Feld for their insightful comments and suggestions during the final preparation of this manuscript. ...
doi:10.1086/679445
fatcat:v2roy7hcxzgr3d4bgqmf54ub3y
Ubiquitous emotion-aware computing
2011
Personal and Ubiquitous Computing
The combination of heart rate variability and three speech measures (i.e., variability of the fundamental frequency of pitch (F0), intensity, and energy) explained 90% (p \ .001) of the participants' experienced ...
Environment (or context), the personality trait neuroticism, and gender proved to be useful when a nuanced assessment of people's emotions was needed. ...
Acknowledgments The author gratefully acknowledges the support of the BrainGain Smart Mix Programme of the Netherlands Ministry of Economic Affairs and the Netherlands Ministry of Education, Culture and ...
doi:10.1007/s00779-011-0479-9
fatcat:rgxhiqgrafewbabhukji4mn4gu
Asian Social Science, Vo. 5, No. 8, August, 2009. All in one PDF file
2009
Asian Social Science
In the eyes of literary critics, rhetorical texts include fictions, poems, prose and dramas and so on while they cover texts of speech and debate in the scholars' eyes of communication. ...
Socio-cultural Knowledge and Miscommunication From the above examples we can see that the signaling of speech activities is not a matter of unilateral action but rather of speaker-listener coordination ...
How to make the best resource combination and how to realize sales growth in marketing circular is the main subject of this text. ...
doi:10.5539/ass.v5n8p0
fatcat:lu76tvd6pjagzkx5aq5n2hta7e
« Previous
Showing results 1 — 15 out of 23 results