Filters








2,373 Hits in 4.3 sec

Using emotional noise to uncloud audio-visual emotion perceptual evaluation

Emily Mower Provost, Irene Zhu, Shrikanth Narayanan
2013 2013 IEEE International Conference on Multimedia and Expo (ICME)  
In this work we present an approach to enhance our understanding of this process using the McGurk effect paradigm, a framework in which stimuli composed of mismatched audio and video cues are presented  ...  These results provide insight into the nature of audio-visual feature integration in emotion perception.  ...  The stimuli contain both emotionally matched (OAV) and emotionally mismatched (RAV) sentence-level audio-visual emotion displays.  ... 
doi:10.1109/icme.2013.6607537 dblp:conf/icmcs/ProvostZN13 fatcat:ibxtt43o6nasfegn2lcinnk7ju

Automatic Summarization of Cricket Highlights using Audio Processing

Ritwik Baranwal
2021 International journal of modern trends in science and technology  
The problem of automatic excitement detection in cricket videos is considered and applied for highlight generation.  ...  This paper focuses on detecting exciting events in video using complementary information from the audio and video domains. First, a method of audio and video elements separation is proposed.  ...  Sports Highlights Generation. Several methods havebeen proposed to automatically extract highlights fromsports videos based on audio and visual cues.  ... 
doi:10.46501/ijmtst070111 fatcat:wwigsfyi4bat3nuifeaurnutsy

Superpower Glass: Delivering Unobtrusive Real-time Social Cues in Wearable Systems [article]

Catalin Voss, Peter Washington, Nick Haber, Aaron Kline, Jena Daniels, Azar Fazel, Titas De, Beth McCarthy, Carl Feinstein, Terry Winograd, Dennis Wall
2020 arXiv   pre-print
In addition, we present a mobile application that enables users of the wearable aid to review their videos along with auto-curated emotional information on the video playback bar.  ...  We evaluate the system as a behavioral aid for children with Autism Spectrum Disorder (ASD), who can greatly benefit from real-time non-invasive emotional cues and are more sensitive to sensory input than  ...  use video clips from everyday conversations [38] .  ... 
arXiv:2002.06581v1 fatcat:zpgfmr53svcqxmuii7ta3qqfay

Human Perception of Audio-Visual Synthetic Character Emotion Expression in the Presence of Ambiguous and Conflicting Information

E. Mower, M.J. Mataric, S. Narayanan
2009 IEEE transactions on multimedia  
The feature sets extracted from emotionally matched audio-visual displays contained both audio and video features while feature sets resulting from emotionally mismatched audio-visual displays contained  ...  This study presents an analysis of the interaction between emotional audio (human voice) and video (simple animation) cues.  ...  This suggests that the audio and video data were providing emotionally confounding cues to the participant with respect to the sad and neutral emotion classes.  ... 
doi:10.1109/tmm.2009.2021722 fatcat:pcxzp2hou5cjveinkdjojwqota

Look&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement [article]

Junwen Xiong, Yu Zhou, Peng Zhang, Lei Xie, Wei Huang, Yufei Zha
2022 arXiv   pre-print
Active speaker detection and speech enhancement have become two increasingly attractive topics in audio-visual scenario understanding.  ...  Therefore, as a motivation to bridge the multi-modal associations in audio-visual tasks, a unified framework is proposed to achieve target speaker detection and speech enhancement with joint learning of  ...  Active Speaker Detection Active Speaker Detection(ASD) is to find who is speaking in a video clip that contains more than one speaker.  ... 
arXiv:2203.02216v2 fatcat:4dowhemn5bburltfwcjjgeohti

A tactile glove design and authoring system for immersive multimedia

Yeongmi Kim, Jongeun Cha, Jeha Ryu, Ian Oakley
2010 IEEE Multimedia  
Furthermore, systems that produce saliency maps from video may be able to highlight and detect key features and generate appropriate tactile cues.  ...  Lastly, the tactile video view window shows the output of the tactile video synchronized with visual scenes, separate from the video overlay.  ... 
doi:10.1109/mmul.2010.5692181 fatcat:xauo23uacvb33pb4q6arm2fajm

The effects of text, audio, video, and in-person communication on bonding between friends

Lauren E. Sherman, Minas Michikyan, Patricia M. Greenfield
2013 Cyberpsychology: Journal of Psychosocial Research on Cyberpspace  
Bonding in each condition was measured through both self-report and affiliation cues (i.e., nonverbal behaviors associated with the emotional experience of bonding).  ...  However, bonding, as measured by both self-report and affiliation cues, differed significantly across conditions, with the greatest bonding during in-person interaction, followed by video chat, audio chat  ...  We measured emotional connectedness through both conscious self-report and through the nonconscious display of affiliation cues.  ... 
doi:10.5817/cp2013-2-3 fatcat:wgbkdxuuenhv7dh7qorry47fby

QoE of cross-modally mapped Mulsemedia: an assessment using eye gaze and heart rate

Gebremariam Mesfin, Nadia Hussain, Elahe Kani-Zabihi, Alexandra Covaci, Estêvão B. Saleme, Gheorghita Ghinea
2020 Multimedia tools and applications  
In our experiments, users were shown six video clips associated with certain visual features based on color, brightness, and shape.  ...  Our results highlight that when the olfactory content is crossmodally congruent with the visual content, the visual attention of the users seems shifted towards the correspondent visual feature.  ...  Q12 The sound enhanced the sense of reality whilst watching the video clip. Q13 The sound enhanced my viewing experience. Q14 I enjoyed watching the video clip whilst wearing a Haptic Vest.  ... 
doi:10.1007/s11042-019-08473-5 fatcat:tpnmbeoc6bh6hpqlmsuzik6hey

Increased Discriminability of Authenticity from Multimodal Laughter is Driven by Auditory Information

Nadine Lavan, Carolyn McGettigan
2017 Quarterly Journal of Experimental Psychology  
We discuss differences and potential mismatches in emotion signaling through voices and faces, in the context of spontaneous and volitional behavior, and highlight issues that should be addressed in future  ...  In a pilot study, we demonstrate that listeners perceive spontaneous laughs as more authentic than volitional ones, both in unimodal (audio-only, visual-only) and multimodal contexts (audiovisual).  ...  In line with studies looking at multimodal emotion categorization, detection rates for audiovisual stimuli were higher compared to audio-only and visual-only stimuli.  ... 
doi:10.1080/17470218.2016.1226370 pmid:27550795 fatcat:gc2c3gffhbagjogqsir6wedtke

Comparing Learning Methodologies for Self-Supervised Audio-Visual Representation Learning

Hacene Terbouche, Liam Schoneveld, Oisin Benson, Alice Othmani.
2022 IEEE Access  
In this paper, a new self-supervised approach is proposed for learning audio-visual representations from large databases of unlabeled videos.  ...  It uses a future prediction task, and learns to align its visual representations with its corresponding audio representations.  ...  ACKNOWLEDGMENT Powder is making the camera of the metaverse and applies artificial intelligence to highlights detection in videogames, and emotion understanding (https://powder.gg/).  ... 
doi:10.1109/access.2022.3164745 fatcat:3g7xld2h2bgtpp2fru2n4yam6a

A review of affective computing: From unimodal analysis to multimodal fusion

Soujanya Poria, Erik Cambria, Rajiv Bajpai, Amir Hussain
2017 Information Fusion  
Multimodality is defined by the presence of more than one modality or channel, e.g., visual, audio, text, gestures, and eye gage.  ...  In this paper, we focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90% of the relevant literature appears to cover these three modalities.  ...  Audio Modality Similar to text and visual feature analysis, emotion and sentiment analysis through audio features has specific components.  ... 
doi:10.1016/j.inffus.2017.02.003 fatcat:ytebhjxlz5bvxcdghg4wxbvr6a

COGNIMUSE: a multimodal video database annotated with saliency, events, semantics and emotion with application to summarization

Athanasia Zlatintsi, Petros Koutras, Georgios Evangelopoulos, Nikolaos Malandrakis, Niki Efthymiou, Katerina Pastra, Alexandros Potamianos, Petros Maragos
2017 EURASIP Journal on Image and Video Processing  
, for the detection of perceptually salient events from videos.  ...  The purpose of this database is manifold; it can be used for training and evaluation of event detection and summarization algorithms, for classification and recognition of audio-visual and cross-media  ...  ca. 60% of the videos having highlight annotations.  ... 
doi:10.1186/s13640-017-0194-1 fatcat:afaddslsknhjrktxqnlmgy4mgq

MELD: A Multimodal Multi-Party Dataset for Emotion Recognition in Conversations

Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, Gautam Naik, Erik Cambria, Rada Mihalcea
2019 Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics  
Each utterance is annotated with emotion and sentiment labels, and encompasses audio, visual, and textual modalities.  ...  To address this gap, we propose the Multimodal EmotionLines Dataset (MELD), an extension and enhancement of EmotionLines.  ...  MELD contains raw videos, audio segments, and transcripts for multimodal processing. Additionally, we also provide the features used in our baseline experiments.  ... 
doi:10.18653/v1/p19-1050 dblp:conf/acl/PoriaHMNCM19 fatcat:qtbwbfyndffmdnhh35nauvpfhm

Deep Personality Trait Recognition: A Survey

Xiaoming Zhao, Zhiwei Tang, Shiqing Zhang
2022 Frontiers in Psychology  
These methods are analyzed and summarized in both single modality and multiple modalities, such as audio, visual, text, and physiological signals.  ...  Gorbova et al. (2017 Gorbova et al. ( , 2018) ) provided an automatic personality screening method on the basis of visual, audio, and text (lexical) cues from short video clips for predicting the Big-five  ...  In detail, the audio data and visual data were firstly extracted from the video clip. Then, the whole audio data were fed into an audio deep residual network for feature learning.  ... 
doi:10.3389/fpsyg.2022.839619 pmid:35645923 pmcid:PMC9136483 fatcat:5eh2ohzjwff5jb4yjn6rzrw5ye

Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention

Georgios Evangelopoulos, Athanasia Zlatintsi, Alexandros Potamianos, Petros Maragos, Konstantinos Rapantzikos, Georgios Skoumas, Yannis Avrithis
2013 IEEE transactions on multimedia  
Detection of attention-invoking audiovisual segments is formulated in this work on the basis of saliency models for the audio, visual, and textual information conveyed in a video stream.  ...  Visual saliency is measured through a spatiotemporal attention model driven by intensity, color, and orientation.  ...  Clips are selected based on their attentional capacity through the computed multimodal, audio-visual-text (AVT) saliency.  ... 
doi:10.1109/tmm.2013.2267205 fatcat:jjt7xmjh5narlm5wr2strvrqza
« Previous Showing results 1 — 15 out of 2,373 results