Filters








4,547 Hits in 9.1 sec

A Deep Reinforcement Learning Approach for Audio-based Navigation and Audio Source Localization in Multi-speaker Environments [article]

Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
2021 arXiv   pre-print
In this work we apply deep reinforcement learning to the problems of navigating a three-dimensional environment and inferring the locations of human speaker audio sources within, in the case where the  ...  For this purpose we create two virtual environments using the Unity game engine, one presenting an audio-based navigation problem and one presenting an audio source localization problem.  ...  are human speakers, using an approach based on online Deep Reinforcement Learning.  ... 
arXiv:2110.12778v3 fatcat:abrnq4gjmfenlkvmp7dmctvy6e

A Deep Reinforcement Learning Approach to Audio-Based Navigation in a Multi-Speaker Environment [article]

Petros Giannakopoulos, Aggelos Pikrakis, Yannis Cotronis
2021 arXiv   pre-print
In this work we use deep reinforcement learning to create an autonomous agent that can navigate in a two-dimensional space using only raw auditory sensory information from the environment, a problem that  ...  The agent is shown to be robust to speaker pitch shifting and it can learn to navigate the environment, even when a limited number of training utterances are available for each speaker.  ...  CONCLUSIONS In this work we investigated the performance of deep reinforcement learning in audio-based only navigation in a twodimensional space containing speakers as audio sources.  ... 
arXiv:2105.04488v1 fatcat:zccnole5j5anrenwswbm5zxes4

OtoWorld: Towards Learning to Separate by Learning to Move [article]

Omkar Ranadive, Grant Gasser, David Terpay, Prem Seetharaman
2020 arXiv   pre-print
The sources are placed randomly within the room and can vary in number. The agent receives a reward for turning off a source.  ...  OtoWorld is the audio analogue of GridWorld, a simple navigation game. OtoWorld can be easily extended to more complex environments and games.  ...  Further, our goal in OtoWorld is to provide software in which researchers can easily try tasks like echolocation, source localization, and audio-based navigation.  ... 
arXiv:2007.06123v1 fatcat:tham6qwkyzcmzmb5verpgnlcme

2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
., +, TASLP 2020 1875-1887 Source Localization Using Distributed Microphones in Reverberant Environments Based on Deep Learning and Ray Space Transform.  ...  ., +, TASLP 2020 1875-1887 Source Localization Using Distributed Microphones in Reverberant Environments Based on Deep Learning and Ray Space Transform.  ...  T Target tracking Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. Zhang, Q., +, TASLP 2020 1183 -1197  ... 
doi:10.1109/taslp.2021.3055391 fatcat:7vmstynfqvaprgz6qy3ekinkt4

SoundSpaces: Audio-Visual Navigation in 3D Environments [article]

Changan Chen, Unnat Jain, Carl Schissler, Sebastia Vicenc Amengual Gari, Ziad Al-Halah, Vamsi Krishna Ithapu, Philip Robinson, Kristen Grauman
2020 arXiv   pre-print
We propose a multi-modal deep reinforcement learning approach to train navigation policies end-to-end from a stream of egocentric audio-visual observations, allowing the agent to (1) discover elements  ...  We introduce audio-visual navigation for complex, acoustically and visually realistic 3D environments. By both seeing and hearing, the agent must learn to navigate to a sounding object.  ...  Acknowledgements UT Austin is supported in part by DARPA Lifelong Learning Machines.  ... 
arXiv:1912.11474v3 fatcat:vidyc3jrzzeofdnadv7t2xzi5q

Move2Hear: Active Audio-Visual Source Separation [article]

Sagnik Majumder, Ziad Al-Halah, Kristen Grauman
2021 arXiv   pre-print
Towards this goal, we introduce a reinforcement learning approach that trains movement policies controlling the agent's camera and microphone placement over time, guided by the improvement in predicted  ...  Using state-of-the-art realistic audio-visual simulations in 3D environments, we demonstrate our model's ability to find minimal movement sequences with maximal payoff for audio source separation.  ...  Acknowledgements: UT Austin is supported in part by DARPA L2M and the IFML NSF AI Institute. K.G. is paid as a Research Scientist by Facebook AI.  ... 
arXiv:2105.07142v2 fatcat:e5hwnxd3cjc7zclxmmr3jji3ti

Dynamical Audio-Visual Navigation: Catching Unheard Moving Sound Sources in Unmapped 3D Environments [article]

Abdelrahman Younes
2022 arXiv   pre-print
We propose an end-to-end reinforcement learning approach that relies on a multi-modal architecture that fuses the spatial audio-visual information from a binaural audio signal and spatial occupancy maps  ...  We introduce the novel dynamic audio-visual navigation benchmark in which an embodied AI agent must catch a moving sound source in an unmapped environment in the presence of distractors and noisy sounds  ...  Finally, the authors of [7] have proposed a multi-modal reinforcement learning approach to train the agent to navigate towards the sound emitting source using only audio and visual observations.  ... 
arXiv:2201.04279v1 fatcat:4bg7ziyxhjhsjkud4i6uep5ifm

Deep Learning for Embodied Vision Navigation: A Survey [article]

Fengda Zhu, Yi Zhu, Vincent CS Lee, Xiaodan Liang, Xiaojun Chang
2021 arXiv   pre-print
"Embodied visual navigation" problem requires an agent to navigate in a 3D environment mainly rely on its first-person observation.  ...  The remarkable learning ability of deep learning methods empowered the agents to accomplish embodied visual navigation tasks.  ...  [15] firstly propose to use deep learning for feature matching and deep reinforcement learning for policy prediction, which allows the agent to better generalize to unseen environments.  ... 
arXiv:2108.04097v4 fatcat:46p2p3zlivabbn7dvowkyccufe

Paper Titles

2019 2019 IEEE 8th Global Conference on Consumer Electronics (GCCE)  
Recognition in Ambient Sensing Smart Homes Deep Monocular Depth Estimation in Partially-Known Environments Deep Neural Networks Based Invisible Steganography for Audio-into-Image Algorithm Deep-Learning  ...  Based on Deep Learning Robust Reflection Removal Against Accumulated Error by Using Stereo Camera System Route Control for Vehicle Access Point for Pedestrian Safe Reinforcement Learning in Continuous  ... 
doi:10.1109/gcce46687.2019.9015409 fatcat:6k3r6jixrvglrkrkzek636gb54

GCCE 2020 Subject Index

2020 2020 IEEE 9th Global Conference on Consumer Electronics (GCCE)  
T U V W Self-Attention Based Neural Network for Few Shot Classification Self-Attention Based Neural Network for Few Shot Classification Separation of Multiple Sound Sources in the Same Direction  ...  by Instantaneous Spectral Estimation Separation of Multiple Sound Sources in the Same Direction by Instantaneous Spectral Estimation Sequence-To-One Neural Networks for Japanese Dialect Speech Classification  ...  File Storage System Multi-Plane Phase Detection Autofocus for 8K Three-Chip Color Imaging Cameras Multi-Sound Source Localization in Time Domain Using Voting Mechanism Multi-Sound Source Localization  ... 
doi:10.1109/gcce50665.2020.9291796 fatcat:bmnnn7xnxrefhaneq262fe4i6u

A Review on Path Selection and Navigation Approaches Towards an Assisted Mobility of Visually Impaired People

2020 KSII Transactions on Internet and Information Systems  
In the process, we explore machine learning approaches for robotic path planning, multi constrained optimal path computation and sensor based wearable assistive devices for the visually impaired.  ...  He is a highly cited and published scholar. He has specialized his research and teaching in machine learning, computer vision and deep learning.  ...  Computer vision and deep learning methods have the potential to provide an assistance and solve these issues to some extent. • Real-time navigation: A classical navigation problem involves movement of  ... 
doi:10.3837/tiis.2020.08.007 fatcat:k3r6ceqxcbg7xkllthpox3ix3a

Review of end-to-end speech synthesis technology based on deep learning [article]

Zhaoxi Mu, Xinyu Yang, Yizhuo Dong
2021 arXiv   pre-print
Due to the limitations of high complexity and low efficiency of traditional speech synthesis technology, the current research focus is the deep learning-based end-to-end speech synthesis technology, which  ...  Moreover, this paper also summarizes the open-source speech corpus of English, Chinese and other languages that can be used for speech synthesis tasks, and introduces some commonly used subjective and  ...  Multiple approaches have been proposed for this issue, including reinforcement learning [122, 233, 246] , approximation by beam search [186] , and approximation by soft attention for training [170]  ... 
arXiv:2104.09995v1 fatcat:q5lx74ycx5hobjox4ktl3amfta

A Systematic Review on Affective Computing: Emotion Models, Databases, and Recent Advances [article]

Yan Wang, Wei Song, Wei Tao, Antonio Liotta, Dawei Yang, Xinlei Li, Shuyong Gao, Yixuan Sun, Weifeng Ge, Wei Zhang, Wenqiang Zhang
2022 arXiv   pre-print
baseline dataset, fusion strategies for multimodal affective analysis, and unsupervised learning models.  ...  Affective computing plays a key role in human-computer interactions, entertainment, teaching, safe driving, and multimedia integration.  ...  Various DL-based approaches for TSA include deep convolutional neural network (ConvNet) learning, deep RNN learning, deep ConvNet-RNN learning and deep adversarial learning, as detailed next.  ... 
arXiv:2203.06935v3 fatcat:h4t3omkzjvcejn2kpvxns7n2qe

A Review on MAS-Based Sentiment and Stress Analysis User-Guiding and Risk-Prevention Systems in Social Network Analysis

Guillem Aguado, Vicente Julián, Ana García-Fornes, Agustín Espinosa
2020 Applied Sciences  
navigate and interact between each other in a more safe way.  ...  For this reason, in this survey we explore works in the line of prevention of risks that can arise from social interaction in online environments, focusing on works using Multi-Agent System (MAS) technologies  ...  ), recorded at 48 kHz; IEMOCAP database: Audio-visual data in English, only audio track considered for this work, five male speakers and five Leave-one-speaker-out cross-validation female speakers, six  ... 
doi:10.3390/app10196746 fatcat:m2gqf3utabgtrcvhtbh53hksfq

Embodied AI-Driven Operation of Smart Cities: A Concise Review [article]

Farzan Shenavarmasouleh, Farid Ghareh Mohammadi, M. Hadi Amini, Hamid R. Arabnia
2021 arXiv   pre-print
Embodied AI aims to train an agent that can See (Computer Vision), Talk (NLP), Navigate and Interact with its environment (Reinforcement Learning), and Reason (General Intelligence), all at the same time  ...  It focuses on learning through interaction with the surrounding environment, as opposed to Internet AI which tries to learn from static datasets.  ...  Target- driven visual navigation in indoor scenes using deep reinforcement learning. In 2017 IEEE international conference on robotics and automation (ICRA), pages 3357–3364.  ... 
arXiv:2108.09823v1 fatcat:xcjyq2ad3jgbborpldopgcd3vm
« Previous Showing results 1 — 15 out of 4,547 results