Filters








5,069 Hits in 3.7 sec

Deep Reinforcement Learning for Audio-Visual Gaze Control

Stephane Lathuiliere, Benoit Masse, Pablo Mesejo, Radu Horaud
2018 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)  
Deep learning revolutionized data science, and recently its popularity has grown exponentially, as did the amount of papers employing deep networks.  ...  Our results reinforce the hypothesis according to which, in general, a general-purpose network (e.g.  ...  In summary, the analysis of prior work shows that, first, the absence of a systematic evaluation of deep learning advances in regression, and second, an over abundance of papers based on deep learning  ... 
doi:10.1109/iros.2018.8594327 dblp:conf/iros/LathuiliereMMH18 fatcat:4s4e3j7zdrdtjpifeqr6ulibey

Neural network based reinforcement learning for audio–visual gaze control in human–robot interaction

Stéphane Lathuilière, Benoit Massé, Pablo Mesejo, Radu Horaud
2018 Pattern Recognition Letters  
This paper introduces a novel neural network-based reinforcement learning approach for robot gaze control.  ...  Our approach enables a robot to learn and to adapt its gaze control strategy for human-robot interaction neither with the use of external sensors nor with human supervision.  ...  First, robot gaze control is formulated as a reinforcement learning problem, allowing the robot to autonomously learn its own gaze control strategy from multimodal data.  ... 
doi:10.1016/j.patrec.2018.05.023 fatcat:yfybwgigmjdlvnwm3s5unhrqrq

A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker Environment [article]

Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
2021 arXiv   pre-print
In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that  ...  has received very little attention in the reinforcement learning literature.  ...  In [5] , the authors used Deep Reinforcement Learning for controlling the gaze of a robotic head based on audio and visual data from the virtual environment.  ... 
arXiv:2105.04488v1 fatcat:zccnole5j5anrenwswbm5zxes4

A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments [article]

Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
2021 arXiv   pre-print
In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within, in the case where the  ...  For this purpose we create two virtual environments using the Unity game engine, one presenting an audio-based navigation problem and one presenting an audio source localization problem.  ...  In [7] the authors employed deep reinforcement learning for controlling the gaze of a robotic head based on audio and visual data from a virtual environment.  ... 
arXiv:2110.12778v3 fatcat:abrnq4gjmfenlkvmp7dmctvy6e

Paper Titles

2019 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE)  
Camera System Route Control for Vehicle Access Point for Pedestrian Safe Reinforcement Learning in Continuous State Spaces Safety Monitoring and Museum Visitor Behavior Measurement Using Fish-Eye Camera  ...  for Audio-into-Image Algorithm Deep-Learning Based Pedestrian Direction Detection for Anti-collision of Intelligent Self-propelled Vehicles Deployment and Evaluation of Elite, an Open Source Implementation  ... 
doi:10.1109/gcce46687.2019.9015409 fatcat:6k3r6jixrvglrkrkzek636gb54

Multimodal Sentiment Analysis with Word-Level Fusion and Reinforcement Learning [article]

Minghai Chen, Sen Wang, Paul Pu Liang, Tadas Baltrušaitis, Amir Zadeh, Louis-Philippe Morency
2018 arXiv   pre-print
architecture for multimodal sentiment analysis that performs modality fusion at the word level.  ...  Qualitative analysis on our model emphasizes the importance of the Temporal Attention Layer in sentiment prediction because the additional acoustic and visual modalities are noisy.  ...  We have a controller C a , with weights θ a , for determining the on/off of audio modalities, and C v , with weights θ v , for determining the on/off of visual modalities.  ... 
arXiv:1802.00924v1 fatcat:xftn45m53jekxcwd7st3kkhzry

Detecting Depression with Word-Level Multimodal Fusion

Morteza Rohanian, Julian Hough, Matthew Purver
2019 Zenodo  
To mitigate noisy modalities, we utilize fusion gates that control the degree to which the audio or visual modality contributes to the final prediction.  ...  We propose a model that is able to perform modality fusion incrementally after each word in an utterance using a time-dependent recurrent approach in a deep learning set-up.  ...  s alternative deep learning model which uses two LSTMs (audio-based and text-based) and a final feedforward network to model sequences of interactions for detecting depression [18] ; (iv) Williamson et  ... 
doi:10.5281/zenodo.3689458 fatcat:k3m3b7edrbfhfnugcuqejqvzwu

Table of Contents

2018 2018 IEEE International Symposium on Multimedia (ISM)  
Watson Research Center) Deep Reinforcement Learning with Parameterized Action Space for Object Detection 101 Zheng Wu (Ryerson University), Naimul Mefraz Khan (Ryerson University), Lei Gao (Ryerson University  ...  Viitanen (Tampere University of Tampere), Jarno Vanne (Tampere University of Technology), and Timo Hämäläinen (Tampere University of Technology) Deep Learning of Human Perception in Audio Event Classification  ... 
doi:10.1109/ism.2018.00004 fatcat:itdeitfrvfcabi3bnp772wfzte

Multi-level Attention network using text, audio and video for Depression Prediction [article]

Anupama Ray, Siddharth Kumar, Rutvik Reddy, Prerana Mukherjee, Ritu Garg
2019 arXiv   pre-print
The multi-level attention reinforces overall learning by selecting the most influential features within each modality for the decision making.  ...  This paper presents a novel multi-level attention based network for multi-modal depression prediction that fuses features from audio, video and text modalities while learning the intra and inter modality  ...  Apart from these low-level features mentioned above, a high dimensional deep representation of the audio sample is extracted by passing the audio through a Deep Spectrum and a VGG network.  ... 
arXiv:1909.01417v1 fatcat:qkyp2v5kzba7bj7inewz3rt3la

Speech Driven Backchannel Generation using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction [article]

Nusrah Hussain, Engin Erzin, T. Metin Sezgin, Yucel Yemez
2019 arXiv   pre-print
Therefore, we introduce deep Q-network (DQN) in a batch reinforcement learning framework, where an optimal policy is learned from a batch data collected using a more controlled policy.  ...  We address the problem within an off-policy reinforcement learning framework, and show how a robot may learn to produce non-verbal backchannels like laughs, when trained to maximize the engagement and  ...  [10] use recurrent neural network architecture in combination with Q-learning to find an optimal policy for robot gaze control in HRI.  ... 
arXiv:1908.01618v1 fatcat:tcwairlrvjc6zm72cr4vuqxnei

Gaze Control of a Robotic Head for Realistic Interaction With Humans

Jaime Duque-Domingo, Jaime Gómez-García-Bermejo, Eduardo Zalama
2020 Frontiers in Neurorobotics  
When there is an interaction between a robot and a person, gaze control is very important for face-to-face communication.  ...  A robotic head has been designed and built and a virtual agent projected on the robot's face display has been integrated with the gaze control.  ...  The author uses a convolutional neural network to predict object locations and a reinforcement learning method for robotic gaze control.  ... 
doi:10.3389/fnbot.2020.00034 pmid:32625075 pmcid:PMC7311780 fatcat:3s5glcnk7ndzrge2mzb27nf3je

Speech Driven Backchannel Generation Using Deep Q-Network for Enhancing Engagement in Human-Robot Interaction

Nusrah Hussain, Engin Erzin, T. Metin Sezgin, Yücel Yemez
2019 Interspeech 2019  
Therefore, we introduce deep Qnetwork (DQN) in a batch reinforcement learning framework, where an optimal policy is learned from a batch data collected using a more controlled policy.  ...  We address the problem within an off-policy reinforcement learning framework, and show how a robot may learn to produce non-verbal backchannels like laughs, when trained to maximize the engagement and  ...  [10] use recurrent neural network architecture in combination with Q-learning to find an optimal policy for robot gaze control in HRI.  ... 
doi:10.21437/interspeech.2019-2521 dblp:conf/interspeech/HussainESY19 fatcat:kvjlh655a5a3nllbc5wexgsg7y

Agents that Listen: High-Throughput Reinforcement Learning with Multiple Sensory Systems [article]

Shashank Hegde, Anssi Kanervisto, Aleksei Petrenko
2021 arXiv   pre-print
To facilitate progress in this area we introduce a new version of VizDoom simulator to create a highly efficient learning environment that provides raw audio observations.  ...  Learning to act based on combined visual and auditory inputs is still a new topic of research that has not been explored beyond simple scenarios.  ...  ACKNOWLEDGMENTS We thank the reviewers of this paper for very insightful comments and ideas for the future work, which we included in the previous section.  ... 
arXiv:2107.02195v1 fatcat:kdzssqpslndxzm2mqfwmythwem

DAVE: A Deep Audio-Visual Embedding for Dynamic Saliency Prediction [article]

Hamed R. Tavakoli, Ali Borji, Esa Rahtu, Juho Kannala
2020 arXiv   pre-print
This paper studies audio-visual deep saliency prediction.  ...  It introduces a conceptually simple and effective Deep Audio-Visual Embedding for dynamic saliency prediction dubbed "DAVE" in conjunction with our efforts towards building an Audio-Visual Eye-tracking  ...  This superiority is further reinforced with the audio features in the Audio-Visual model. VII.  ... 
arXiv:1905.10693v2 fatcat:5tby44imzrcnvflhdfj4rrasie

2021 Index IEEE Transactions on Multimedia Vol. 23

2021 IEEE transactions on multimedia  
The Author Index contains the primary entry for each item, listed under the first author's name.  ...  ., +, TMM 2021 3377-3387 TCLiVi: Transmission Control in Live Video Streaming Based on Deep Reinforcement Learning.  ...  Liu, J., +, TMM 2021 1530-1541 TCLiVi: Transmission Control in Live Video Streaming Based on Deep Reinforcement Learning.  ... 
doi:10.1109/tmm.2022.3141947 fatcat:lil2nf3vd5ehbfgtslulu7y3lq
« Previous Showing results 1 — 15 out of 5,069 results