A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker Environment
[article]
2021
arXiv
pre-print
In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that ...
The agent is shown to be robust to speaker pitch shifting and it can learn to navigate the environment, even when a limited number of training utterances are available for each speaker. ...
CONCLUSIONS In this work we investigated the performance of deep reinforcement learning in audio-based only navigation in a twodimensional space containing speakers as audio sources. ...
arXiv:2105.04488v1
fatcat:zccnole5j5anrenwswbm5zxes4
A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments
[article]
2021
arXiv
pre-print
In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within, in the case where the ...
We also create an autonomous agent based on PPO online reinforcement learning algorithm and attempt to train it to solve these environments. ...
are human speakers, using an approach based on online Deep Reinforcement Learning. ...
arXiv:2110.12778v3
fatcat:abrnq4gjmfenlkvmp7dmctvy6e
OtoWorld: Towards Learning to Separate by Learning to Move
[article]
2020
arXiv
pre-print
We present OtoWorld, an interactive environment in which agents must learn to listen in order to solve navigational tasks. ...
The purpose of OtoWorld is to facilitate reinforcement learning research in computer audition, where agents must learn to listen to the world around them to navigate. ...
Further, our goal in OtoWorld is to provide software in which researchers can easily try tasks like echolocation, source localization, and audio-based navigation. ...
arXiv:2007.06123v1
fatcat:tham6qwkyzcmzmb5verpgnlcme
2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28
2020
IEEE/ACM Transactions on Audio Speech and Language Processing
., +, TASLP 2020 185-197 DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power Spectral Density Estimation. ...
., +, TASLP 2020 941-950
DeepMMSE: A Deep Learning Approach to MMSE-Based Noise Power
Spectral Density Estimation. ...
T Target tracking Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. Zhang, Q., +, TASLP 2020 1183 -1197 ...
doi:10.1109/taslp.2021.3055391
fatcat:7vmstynfqvaprgz6qy3ekinkt4
SoundSpaces: Audio-Visual Navigation in 3D Environments
[article]
2020
arXiv
pre-print
We propose a multi-modal deep reinforcement learning approach to train navigation policies end-to-end from a stream of egocentric audio-visual observations, allowing the agent to (1) discover elements ...
We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments. By both seeing and hearing, the agent must learn to navigate to a sounding object. ...
Acknowledgements UT Austin is supported in part by DARPA Lifelong Learning Machines. ...
arXiv:1912.11474v3
fatcat:vidyc3jrzzeofdnadv7t2xzi5q
Deep Learning for Embodied Vision Navigation: A Survey
[article]
2021
arXiv
pre-print
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation. ...
The remarkable learning ability of deep learning methods empowered the agents to accomplish embodied visual navigation tasks. ...
[15] firstly propose to use deep learning for feature matching and deep reinforcement learning for policy prediction, which allows the agent to better generalize to unseen environments. ...
arXiv:2108.04097v4
fatcat:46p2p3zlivabbn7dvowkyccufe
Move2Hear: Active Audio-Visual Source Separation
[article]
2021
arXiv
pre-print
Towards this goal, we introduce a reinforcement learning approach that trains movement policies controlling the agent's camera and microphone placement over time, guided by the improvement in predicted ...
We introduce the active audio-visual source separation problem, where an agent must move intelligently in order to better isolate the sounds coming from an object of interest in its environment. ...
Acknowledgements: UT Austin is supported in part by DARPA L2M and the IFML NSF AI Institute. K.G. is paid as a Research Scientist by Facebook AI. ...
arXiv:2105.07142v2
fatcat:e5hwnxd3cjc7zclxmmr3jji3ti
Hierarchical RNNs-Based Transformers MADDPG for Mixed Cooperative-Competitive Environments
[article]
2021
arXiv
pre-print
MARL (known as Multi-Agent Reinforcement Learning) can be recognized as a set of independent agents trying to adapt and learn through their way to reach the goal. ...
At present, attention mechanism has been widely applied to the fields of deep learning models. ...
One of the big challenges in the field of Reinforcement Learning (RL) is to develop an efficient swarm intelligence based multi-agent system and to optimize the involving tasks [10] . ...
arXiv:2105.04888v1
fatcat:m6bciz74bvh7vkw6fyzdes6mwe
Deep Learning and Reinforcement Learning for Autonomous Unmanned Aerial Systems: Roadmap for Theory to Deployment
[article]
2020
arXiv
pre-print
Therefore, in this chapter, we discuss how some of the advances in machine learning, specifically deep learning and reinforcement learning can be leveraged to develop next-generation autonomous UAS. ...
Accordingly, we discuss how deep learning approaches have been used to accomplish some of the basic tasks that contribute to providing UAS autonomy. ...
In model-based reinforcement learning, the agent attempts to learn a model of the environment directly, by learning P and R, and then using the environmental model to plan actions using algorithms similar ...
arXiv:2009.03349v2
fatcat:5ylreoukrfcrtorzzp44mntjum
A Deep Reinforcement Learning Blind AI in DareFightingICE
[article]
2022
arXiv
pre-print
This paper presents a deep reinforcement learning AI that uses sound as the input on the DareFightingICE platform at the DareFightingICE Competition in IEEE CoG 2022. ...
We propose different approaches to process audio data and use the Proximal Policy Optimization algorithm for our blind AI. ...
In addition, PPO is outstanding in audio processing tasks, such as audio-based navigation in a multi-speaker environment [17] and semantic audio-visual navigation, where objects' sounds are consistent ...
arXiv:2205.07444v1
fatcat:taipi5fe2vg3dex7dy7w3cd7u4
A Review on Voice-based Interface for Human-Robot Interaction
2020
Iraqi Journal for Electrical And Electronic Engineering
In this work, a review of the voice-based interface for HRI systems has been presented. ...
The voice-based interface robot can recognize the speech information from humans so that it will be able to interact more naturally with its human counterpart in different environments. ...
Supervised machine learning algorithms can be replaced by Reinforcement-based machine learning algorithms, which will, in turn, bring about progress in areas where large data sets are not available, as ...
doi:10.37917/ijeee.16.2.10
fatcat:crz5ieseo5g5nmcllbpm5oz22e
Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments
[article]
2022
arXiv
pre-print
We propose an end-to-end reinforcement learning approach that relies on a multi-modal architecture that fuses the spatial audio-visual information from a binaural audio signal and spatial occupancy maps ...
Recent work on audio-visual navigation targets a single static sound in noise-free audio environments and struggles to generalize to unheard sounds. ...
Finally, the authors of [7] have proposed a multi-modal reinforcement learning approach to train the agent to navigate towards the sound emitting source using only audio and visual observations. ...
arXiv:2201.04279v1
fatcat:4bg7ziyxhjhsjkud4i6uep5ifm
Paper Titles
2019
2019 IEEE 8th Global Conference on Consumer Electronics (GCCE)
Deep Learning for Multi-Resident Activity Recognition in Ambient Sensing Smart Homes Deep Monocular Depth Estimation in Partially-Known Environments Deep Neural Networks Based Invisible Steganography ...
of Mean Delay Between Paths in MPTCP by SDN Proposal of a New Application Method of 4K Editing Technology to Sports Video Proposal of Allocating Radio Resources to Multiple Slices in 5G Using Deep Reinforcement ...
doi:10.1109/gcce46687.2019.9015409
fatcat:6k3r6jixrvglrkrkzek636gb54
Embodied AI-Driven Operation of Smart Cities: A Concise Review
[article]
2021
arXiv
pre-print
Embodied AI aims to train an agent that can See (Computer Vision), Talk (NLP), Navigate and Interact with its environment (Reinforcement Learning), and Reason (General Intelligence), all at the same time ...
It focuses on learning through interaction with the surrounding environment, as opposed to Internet AI which tries to learn from static datasets. ...
Target-
driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international
conference on robotics and automation (ICRA), pages 3357–3364. ...
arXiv:2108.09823v1
fatcat:xcjyq2ad3jgbborpldopgcd3vm
Learning Visual-Audio Representations for Voice-Controlled Robots
[article]
2022
arXiv
pre-print
We successfully deploy the policy learned in a simulator to a real Kinova Gen3. ...
To address these problems, we learn a visual-audio representation (VAR) that associates images and sound commands with minimal supervision. ...
The robot's goal is to navigate to the object mentioned in a command based on RGB images. ...
arXiv:2109.02823v2
fatcat:eylagsjz5vckzjftyoztnydwuy
« Previous
Showing results 1 — 15 out of 4,880 results