4,880 Hits in 5.3 sec

A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker Environment [article]

Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
2021 arXiv   pre-print
In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that  ...  The agent is shown to be robust to speaker pitch shifting and it can learn to navigate the environment, even when a limited number of training utterances are available for each speaker.  ...  CONCLUSIONS In this work we investigated the performance of deep reinforcement learning in audio-based only navigation in a twodimensional space containing speakers as audio sources.  ... 
arXiv:2105.04488v1 fatcat:zccnole5j5anrenwswbm5zxes4

A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments [article]

Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
2021 arXiv   pre-print
In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within, in the case where the  ...  We also create an autonomous agent based on PPO online reinforcement learning algorithm and attempt to train it to solve these environments.  ...  are human speakers, using an approach based on online Deep Reinforcement Learning.  ... 
arXiv:2110.12778v3 fatcat:abrnq4gjmfenlkvmp7dmctvy6e

OtoWorld: Towards Learning to Separate by Learning to Move [article]

Omkar Ranadive, Grant Gasser, David Terpay, Prem Seetharaman
2020 arXiv   pre-print
We present OtoWorld, an interactive environment in which agents must learn to listen in order to solve navigational tasks.  ...  The purpose of OtoWorld is to facilitate reinforcement learning research in computer audition, where agents must learn to listen to the world around them to navigate.  ...  Further, our goal in OtoWorld is to provide software in which researchers can easily try tasks like echolocation, source localization, and audio-based navigation.  ... 
arXiv:2007.06123v1 fatcat:tham6qwkyzcmzmb5verpgnlcme

2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
., +, TASLP 2020 185-197 DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation.  ...  ., +, TASLP 2020 941-950 DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation.  ...  T Target tracking Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. Zhang, Q., +, TASLP 2020 1183 -1197  ... 
doi:10.1109/taslp.2021.3055391 fatcat:7vmstynfqvaprgz6qy3ekinkt4

SoundSpaces: Audio-Visual Navigation in 3D Environments [article]

Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman
2020 arXiv   pre-print
We propose a multi-modal deep reinforcement learning approach to train navigation policies end-to-end from a stream of egocentric audio-visual observations, allowing the agent to (1) discover elements  ...  We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments. By both seeing and hearing, the agent must learn to navigate to a sounding object.  ...  Acknowledgements UT Austin is supported in part by DARPA Lifelong Learning Machines.  ... 
arXiv:1912.11474v3 fatcat:vidyc3jrzzeofdnadv7t2xzi5q

Deep Learning for Embodied Vision Navigation: A Survey [article]

Fengda Zhu, Yi Zhu, Vincent CS Lee, Xiaodan Liang, Xiaojun Chang
2021 arXiv   pre-print
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation.  ...  The remarkable learning ability of deep learning methods empowered the agents to accomplish embodied visual navigation tasks.  ...  [15] firstly propose to use deep learning for feature matching and deep reinforcement learning for policy prediction, which allows the agent to better generalize to unseen environments.  ... 
arXiv:2108.04097v4 fatcat:46p2p3zlivabbn7dvowkyccufe

Move2Hear: Active Audio-Visual Source Separation [article]

Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
2021 arXiv   pre-print
Towards this goal, we introduce a reinforcement learning approach that trains movement policies controlling the agent's camera and microphone placement over time, guided by the improvement in predicted  ...  We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment.  ...  Acknowledgements: UT Austin is supported in part by DARPA L2M and the IFML NSF AI Institute. K.G. is paid as a Research Scientist by Facebook AI.  ... 
arXiv:2105.07142v2 fatcat:e5hwnxd3cjc7zclxmmr3jji3ti

Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments [article]

Xiaolong Wei, LiFang Yang, Xianglin Huang, Gang Cao, Tao Zhulin, Zhengyang Du, Jing An
2021 arXiv   pre-print
MARL (known as Multi-Agent Reinforcement Learning) can be recognized as a set of independent agents trying to adapt and learn through their way to reach the goal.  ...  At present, attention mechanism has been widely applied to the fields of deep learning models.  ...  One of the big challenges in the field of Reinforcement Learning (RL) is to develop an efficient swarm intelligence based multi-agent system and to optimize the involving tasks [10] .  ... 
arXiv:2105.04888v1 fatcat:m6bciz74bvh7vkw6fyzdes6mwe

Deep Learning and Reinforcement Learning for Autonomous Unmanned Aerial Systems: Roadmap for Theory to Deployment [article]

Jithin Jagannath, Anu Jagannath, Sean Furman, Tyler Gwin
2020 arXiv   pre-print
Therefore, in this chapter, we discuss how some of the advances in machine learning, specifically deep learning and reinforcement learning can be leveraged to develop next-generation autonomous UAS.  ...  Accordingly, we discuss how deep learning approaches have been used to accomplish some of the basic tasks that contribute to providing UAS autonomy.  ...  In model-based reinforcement learning, the agent attempts to learn a model of the environment directly, by learning P and R, and then using the environmental model to plan actions using algorithms similar  ... 
arXiv:2009.03349v2 fatcat:5ylreoukrfcrtorzzp44mntjum

A Deep Reinforcement Learning Blind AI in DareFightingICE [article]

Thai Van Nguyen, Xincheng Dai, Ibrahim Khan, Ruck Thawonmas, Hai V. Pham
2022 arXiv   pre-print
This paper presents a deep reinforcement learning AI that uses sound as the input on the DareFightingICE platform at the DareFightingICE Competition in IEEE CoG 2022.  ...  We propose different approaches to process audio data and use the Proximal Policy Optimization algorithm for our blind AI.  ...  In addition, PPO is outstanding in audio processing tasks, such as audio-based navigation in a multi-speaker environment [17] and semantic audio-visual navigation, where objects' sounds are consistent  ... 
arXiv:2205.07444v1 fatcat:taipi5fe2vg3dex7dy7w3cd7u4

A Review on Voice-based Interface for Human-Robot Interaction

Ameer Badr, Alia Abdul-Hassan
2020 Iraqi Journal for Electrical And Electronic Engineering  
In this work, a review of the voice-based interface for HRI systems has been presented.  ...  The voice-based interface robot can recognize the speech information from humans so that it will be able to interact more naturally with its human counterpart in different environments.  ...  Supervised machine learning algorithms can be replaced by Reinforcement-based machine learning algorithms, which will, in turn, bring about progress in areas where large data sets are not available, as  ... 
doi:10.37917/ijeee.16.2.10 fatcat:crz5ieseo5g5nmcllbpm5oz22e

Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments [article]

Abdelrahman Younes
2022 arXiv   pre-print
We propose an end-to-end reinforcement learning approach that relies on a multi-modal architecture that fuses the spatial audio-visual information from a binaural audio signal and spatial occupancy maps  ...  Recent work on audio-visual navigation targets a single static sound in noise-free audio environments and struggles to generalize to unheard sounds.  ...  Finally, the authors of [7] have proposed a multi-modal reinforcement learning approach to train the agent to navigate towards the sound emitting source using only audio and visual observations.  ... 
arXiv:2201.04279v1 fatcat:4bg7ziyxhjhsjkud4i6uep5ifm

Paper Titles

2019 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE)  
Deep Learning for Multi-Resident Activity Recognition in Ambient Sensing Smart Homes Deep Monocular Depth Estimation in Partially-Known Environments Deep Neural Networks Based Invisible Steganography  ...  of Mean Delay Between Paths in MPTCP by SDN Proposal of a New Application Method of 4K Editing Technology to Sports Video Proposal of Allocating Radio Resources to Multiple Slices in 5G Using Deep Reinforcement  ... 
doi:10.1109/gcce46687.2019.9015409 fatcat:6k3r6jixrvglrkrkzek636gb54

Embodied AI-Driven Operation of Smart Cities: A Concise Review [article]

Farzan Shenavarmasouleh, Farid Ghareh Mohammadi, M. Hadi Amini, Hamid R. Arabnia
2021 arXiv   pre-print
Embodied AI aims to train an agent that can See (Computer Vision), Talk (NLP), Navigate and Interact with its environment (Reinforcement Learning), and Reason (General Intelligence), all at the same time  ...  It focuses on learning through interaction with the surrounding environment, as opposed to Internet AI which tries to learn from static datasets.  ...  Target- driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3357–3364.  ... 
arXiv:2108.09823v1 fatcat:xcjyq2ad3jgbborpldopgcd3vm

Learning Visual-Audio Representations for Voice-Controlled Robots [article]

Peixin Chang, Shuijing Liu, Katherine Driggs-Campbell
2022 arXiv   pre-print
We successfully deploy the policy learned in a simulator to a real Kinova Gen3.  ...  To address these problems, we learn a visual-audio representation (VAR) that associates images and sound commands with minimal supervision.  ...  The robot's goal is to navigate to the object mentioned in a command based on RGB images.  ... 
arXiv:2109.02823v2 fatcat:eylagsjz5vckzjftyoztnydwuy
« Previous Showing results 1 — 15 out of 4,880 results