Filters








23,904 Hits in 8.2 sec

Face-to-Face Co-Located Human-Human Social Interaction Analysis using Nonverbal Cues: A Survey [article]

Cigdem Beyan and Alessandro Vinciarelli and Alessio Del Bue
2022 arXiv   pre-print
The covered topics are categorized into three as: a) modeling social traits, such as leadership, dominance, personality traits, b) social role recognition and social relations detection and c) interaction  ...  The survey covers a wide spectrum of settings and scenarios, including free-standing interactions, meetings, indoor and outdoor social exchanges, dyadic conversations, and crowd dynamics.  ...  Alessandro Vinciarelli was supported by UKRI (EP/S02266X/1) and EPSRC (EP/N035305/1) grants.  ... 
arXiv:2207.10574v1 fatcat:gaeilc2wqzfj5hrmewhd3wwtei

Responsive Listening Head Generation: A Benchmark Dataset and Baseline [article]

Mohan Zhou, Yalong Bai, Wei Zhang, Ting Yao, Tiejun Zhao, Tao Mei
2022 arXiv   pre-print
., head motions, facial expressions) in a real-time manner.  ...  We present a new listening head generation benchmark, for synthesizing responsive feedbacks of a listener (e.g., nod, smile) during a face-to-face conversation.  ...  We leverage the state-of-the-art deep learning-based 3D face reconstruction model [14] for the videos to get the 3DMM [6] coefficients.  ... 
arXiv:2112.13548v3 fatcat:ogeenkslsfc5bavcxyt724ztde

Talking Faces: Audio-to-Video Face Generation [chapter]

Yuxin Wang, Linsen Song, Wayne Wu, Chen Qian, Ran He, Chen Change Loy
2022 Advances in Computer Vision and Pattern Recognition  
The emergence of deep learning and cross-modality research has led to many interesting works that address talking face generation.  ...  Despite great research efforts in talking face generation, the problem remains challenging due to the need for fine-grained control of face components and the generalization to arbitrary sentences.  ...  [107] for a benchmark designed for evaluating talking-head video generation. • Zhu et al. [120] for a survey on deep audio-visual learning.  ... 
doi:10.1007/978-3-030-87664-7_8 fatcat:5qh2bxrthrbthgjwjzlmm3je4i

Predicting Speaker Head Nods and the Effects of Affective Information

Jina Lee, Stacy C. Marsella
2010 IEEE transactions on multimedia  
In this paper, we present a machine learning approach for learning models of head movements by focusing on when speaker head nods should occur, and conduct evaluation studies that compare the nods generated  ...  During face-to-face conversation, our body is continually in motion, displaying various head, gesture, and posture movements.  ...  Foster and Oberlander [12] present a corpus-based generation of head and eyebrow motion for virtual agents.  ... 
doi:10.1109/tmm.2010.2051874 fatcat:mzjvua2ihfbhtme4oh27s37oh4

End-to-End Listening Agent for Audiovisual Emotional and Naturalistic Interactions

Kevin El Haddad, Yara Rizk, Louise Heron, Nadine Hajj, Yong Zhao, Jaebok Kim, Trung Ngô Trọng, Minha Lee, Marwan Doumit, Payton Lin, Yelin Kim, Hüseyin Çakmak
2018 Journal of Science and Technology of the Arts  
First, a multimodal multitask deep learning-based emotion classification system was built along with a rule-based visual expression detection system.  ...  Finally, a naturalistic, dyadic emotional conversation database was collected. We report here the work made for each of these modules and our planned future improvements.  ...  Yara Rizk is a PhD student enrolled in the electrical and computer engineering department at the American University of Beirut (AUB).  ... 
doi:10.7559/citarj.v10i2.424 fatcat:h2gdjywrdncjtaknouneed7ku4

When I Look into Your Eyes: A Survey on Computer Vision Contributions for Human Gaze Estimation and Tracking

Dario Cazzato, Marco Leo, Cosimo Distante, Holger Voos
2020 Sensors  
A very long journey has been made from the first pioneering works, and this continuous search for more accurate solutions process has been further boosted in the last decade when deep neural networks have  ...  The automatic detection of eye positions, their temporal consistency, and their mapping into a line of sight in the real world (to find where a person is looking at) is reported in the scientific literature  ...  , h.p. for head pose, DL for deep learning.  ... 
doi:10.3390/s20133739 pmid:32635375 pmcid:PMC7374327 fatcat:jwou6gv4f5dy7lrsxvtbnb2fly

Detection of social signals for recognizing engagement in human-robot interaction [article]

Divesh Lala, Koji Inoue, Pierrick Milhorat, Tatsuya Kawahara
2017 arXiv   pre-print
Our motivation in this work is to detect several behaviors which will be used as social signal inputs for a real-time engagement recognition model.  ...  Input data to the models comes from a Kinect sensor and a microphone array.  ...  We propose that the models can function in a varied number of conversational settings, including multi-party dialogue.  ... 
arXiv:1709.10257v1 fatcat:4g5yo4s7jzgjzkdoeuwrzzum6q

Let's face it: Probabilistic multi-modal interlocutor-aware generation of facial gestures in dyadic settings [article]

Patrik Jonell, Taras Kucherenko, Gustav Eje Henter, Jonas Beskow
2020 arXiv   pre-print
Our contributions are: a) a method for feature extraction from multi-party video and speech recordings, resulting in a representation that allows for independent control and manipulation of expression  ...  and speech articulation in a 3D avatar; b) an extension to MoGlow, a recent motion-synthesis method based on normalizing flows, to also take multi-modal signals from the interlocutor as input and subsequently  ...  ACKNOWLEDGMENTS The authors would like to acknowledge the support from the Swedish Foundation for Strategic Research, project EACare [33]  ... 
arXiv:2006.09888v1 fatcat:i3v7cfl4tzgmzgpuboiktr66tq

VR content creation and exploration with deep learning: A survey

Miao Wang, Xu-Quan Lyu, Yi-Jun Li, Fang-Lue Zhang
2020 Computational Visual Media  
Because deep learning systems are able to represent and compose information at various levels in a deep hierarchical fashion, they can build very powerful models which leverage large quantities of visual  ...  This article surveys recent research that uses such deep learning methods for VR content creation and exploration.  ...  Fang-Lue Zhang was supported by a Victoria Early-Career Research Excellence Award.  ... 
doi:10.1007/s41095-020-0162-z fatcat:lgogzx26bvhn5f7uyefjkz7zny

User Attention and Behaviour in Virtual Reality Art Encounter [article]

Mu Mu, Murtada Dohan, Alison Goodyear, Gary Hill, Cleyon Johns, Andreas Mauthe
2020 arXiv   pre-print
Deep learning models are used to study the connections between behavioural data and audience background.  ...  The data from a user experiment with 35 participants reveal a range of user activity patterns in art exploration.  ...  The deep learning training on each of the three network structures was carried out 20 times independently.  ... 
arXiv:2005.10161v1 fatcat:f6nxqblpgrcfnayvghjrpcubea

A Preliminary Exploration of Group Social Engagement Level Recognition in Multiparty Casual Conversation [chapter]

Yuyun Huang, Emer Gilmartin, Benjamin R. Cowan, Nick Campbell
2016 Lecture Notes in Computer Science  
Sensing human social engagement in dyadic or multiparty conversation is key to the design of decision strategies in conversational dialogue agents to decide suitable strategies in various human machine  ...  In this paper we report on studies we have carried out on the novel research topic about social group engagement in nontask oriented (casual) multiparty conversations.  ...  This research is supported by Science Foundation Ireland through the CNGL Programme (Grant 12/CE/I2267) in the ADAPT Centre and CHISTERA-JOKER project at Trinity College Dublin.  ... 
doi:10.1007/978-3-319-43958-7_8 fatcat:ez3dqg6hdbfktclvveg7myb2ba

Social Eye Gaze in Human-Robot Interaction: A Review

Henny Admoni, Brian Scassellati
2017 Journal of Human-Robot Interaction  
It establishes three categories of gaze research in HRI, defined by differences in goals and methods: a human-centered approach, which focuses on people's responses to gaze; a design-centered approach,  ...  This article reviews the state of the art in social eye gaze for human-robot interaction (HRI).  ...  The studies in this section are aligned into three general topics: • How people use eye gaze for conversation and speech (relevant to Sections 5.1 and 5.2) • How people use eye gaze when they refer to  ... 
doi:10.5898/jhri.6.1.admoni fatcat:w2lvp2gfxrcdzlvw6ujpf5z3by

Deep Learning for Visual Speech Analysis: A Survey [article]

Changchong Sheng, Gangyao Kuang, Liang Bai, Chenping Hou, Yulan Guo, Xin Xu, Matti Pietikäinen, Li Liu
2022 arXiv   pre-print
Over the past five years, numerous deep learning based methods have been proposed to address various problems in this area, especially automatic visual speech recognition and generation.  ...  As a powerful AI strategy, deep learning techniques have extensively promoted the development of visual speech learning.  ...  networks to learn the motion and texture separately Regressed the head motions in accordance with audio dynamics Proposed an Audio ID-Removing Network for pure speech feature learning Proposed a memory-augmented  ... 
arXiv:2205.10839v1 fatcat:l5m4ohtcvnevrliaiwawg3phjq

Detecting socially interacting groups using f-formation: A survey of taxonomy, methods, datasets, applications, challenges, and future research directions [article]

Hrishav Bakul Barua, Theint Haythi Mg, Pradip Pramanick, Chayan Sarkar
2021 arXiv   pre-print
In this article, we provide a comprehensive survey of the existing work on social interaction and group detection using f-formation for robotics and other applications.  ...  In this article, we investigate one such social behavior for collocated robots. Imagine a group of people is interacting with each other and we want to join the group.  ...  This method uses a jointly learning framework for estimating the head, body orientations of targets, and f-formations for conversational Method based on pedestrian motion estimation (2018) [105] .  ... 
arXiv:2108.06181v2 fatcat:walfqfi55fe4fja3imr4qu6asu

Engagement in Human-Agent Interaction: An Overview

Catharine Oertel, Ginevra Castellano, Mohamed Chetouani, Jauwairia Nasir, Mohammad Obaid, Catherine Pelachaud, Christopher Peters
2020 Frontiers in Robotics and AI  
We also present models for detecting engagement and for generating multimodal behaviors to show engagement.  ...  , are conducted in a lab or aimed for long term interaction.  ...  Recurrent and Deep Neural Networks were used to detect user engagement decrease in real-time based on analysis of user's behaviors such as proxemics, gaze, head motion, facial expressions, and speech.  ... 
doi:10.3389/frobt.2020.00092 pmid:33501259 pmcid:PMC7806067 fatcat:zdvt343f55a33l7bo33zwr7ta4
« Previous Showing results 1 — 15 out of 23,904 results