Filters








4,848 Hits in 3.4 sec

Hybrid Neural Networks for On-device Directional Hearing [article]

Anran Wang, Maruchi Kim, Hao Zhang, Shyamnath Gollakota
2021 arXiv   pre-print
On-device directional hearing requires audio source separation from a given direction while achieving stringent human-imperceptible latency requirements.  ...  While neural nets can achieve significantly better performance than traditional beamformers, all existing models fall short of supporting low-latency causal inference on computationally-constrained wearables  ...  (Fedorov et al. 2020 ) achieves low-power speech enhancement using LSTM, but it is not for multichannel source separation. Improving MVDR with neural nets.  ... 
arXiv:2112.05893v1 fatcat:eahy332puzdttmriiqx65bzy3q

Monoaural Audio Source Separation Using Deep Convolutional Neural Networks [chapter]

Pritish Chandna, Marius Miron, Jordi Janer, Emilia Gómez
2017 Lecture Notes in Computer Science  
In this paper we introduce a low-latency monaural source separation framework using a Convolutional Neural Network (CNN).  ...  We use a CNN to estimate time-frequency soft masks which are applied for source separation.  ...  Acknowledgments: The TITANX used for this research was donated by the NVIDIA Corporation.  ... 
doi:10.1007/978-3-319-53547-0_25 fatcat:k254yklw35f5phskmdu4avaniu

Low-Latency Deep Clustering For Speech Separation [article]

Shanshan Wang, Gaurav Naithani, Tuomas Virtanen
2019 arXiv   pre-print
This paper proposes a low algorithmic latency adaptation of the deep clustering approach to speaker-independent speech separation.  ...  for low-latency operation, and, c) using a buffer in the beginning of audio mixture to estimate cluster centres corresponding to constituent speakers which are then utilized to separate speakers within  ...  LOW-LATENCY DEEP CLUSTERING In order to make the deep clustering based separation operate with low latency, there are three parts that need to be adapted: a) The topology of the neural network is changed  ... 
arXiv:1902.07033v1 fatcat:352bnrktkveylixgyaypezq5ee

Author Index

2019 2019 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)  
M icrocontroller Yuuki Nishida 328 Data Retrieval from Printed Image Using Image Features and Data Embedding Preprocess ing for Sound Source Separation Using Comple x Weighted Sum Circu its Hiroshi Ochi  ...  Light Detection using Convolutional Neural Networks and Lidar Data Wei-Hung Lin A Design Framewo rk for Hardware Approximation of Deep Neural Networks Kuan-Jen Lin A Compact Triple Passband Bandpass  ...  Chang-Rong Wu A Continuous Facial Expression Recognition Model based on Deep Learning Method Chao-Ming Wu A Continuous Facial Expression Recognition Model based on Deep Learning Method Zheng-Lin  ... 
doi:10.1109/ispacs48206.2019.8986344 fatcat:tyfkzg6wt5fr3m3ngmh2vdxsea

Low-latency Speaker-independent Continuous Speech Separation

Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis
2019 ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency.  ...  This is achieved (1) by using a new speech separation network architecture combined with a double buffering scheme and (2) by performing enhancement with a set of fixed beamformers followed by a neural  ...  Our network model does not use any future data frames. Sound source localization The enhancement processing starts with performing SSL for each of the target and interference speakers.  ... 
doi:10.1109/icassp.2019.8682274 dblp:conf/icassp/YoshiokaCLXED19 fatcat:nvx2cfpbkrfr3jm2fv4f3v6ec4

Low-Latency Speaker-Independent Continuous Speech Separation [article]

Takuya Yoshioka, Zhuo Chen, Changliang Liu, Xiong Xiao, Hakan Erdogan, Dimitrios Dimitriadis
2019 arXiv   pre-print
The previous SI-CSS method uses a neural network trained with permutation invariant training and a data-driven beamformer and thus requires much processing latency.  ...  This paper proposes a low-latency SI-CSS method whose performance is comparable to that of the previous method in a microphone array-based meeting transcription task.This is achieved (1) by using a new  ...  Our network model does not use any future data frames. Sound source localization The enhancement processing starts with performing SSL for each of the target and interference speakers.  ... 
arXiv:1904.06478v1 fatcat:wjozo4mthvel7bpwuuoiw6y4ny

TasNet: time-domain audio separation network for real-time, single-channel speech separation [article]

Yi Luo, Nima Mesgarani
2018 arXiv   pre-print
We directly model the signal in the time-domain using an encoder-decoder framework and perform the source separation on nonnegative encoder outputs.  ...  We propose Time-domain Audio Separation Network (TasNet) to overcome these limitations.  ...  A typical neural network speech separation algorithm starts with calculating the short-time Fourier transform (STFT) to create a timefrequency (T-F) representation of the mixture sound.  ... 
arXiv:1711.00541v2 fatcat:xgtgy5h3d5hitd2tveqgdl7mni

Audio-Visual Target Speaker Enhancement on Multi-Talker Environment using Event-Driven Cameras [article]

Ander Arriandiaga, Giovanni Morrone, Luca Pasa, Leonardo Badino, Chiara Bartolozzi
2021 arXiv   pre-print
In order to overcome this limitation, we propose the use of event-driven cameras and exploit compression, high temporal resolution and low latency, for low cost and low latency motion feature extraction  ...  However, all approaches proposed so far work offline, using frame-based video input, making it difficult to process an audio-visual signal with low latency, for online applications.  ...  Humans solve this problem using complementary and redundant strategies such as physical sound source separation (thanks to stereo sound acquisition [2] ) but also using cues from observing the motion  ... 
arXiv:1912.02671v2 fatcat:fwlis7jmdbbhpg2l2ezpi2jkhm

Deep neural network Based Low-latency Speech Separation with Asymmetric analysis-Synthesis Window Pair [article]

Shanshan Wang, Gaurav Naithani, Archontis Politis, Tuomas Virtanen
2021 arXiv   pre-print
Time-frequency masking or spectrum prediction computed via short symmetric windows are commonly used in low-latency deep neural network (DNN) based source separation.  ...  We report an improvement in separation performance of up to 1.5 dB in terms of source-to-distortion ratio (SDR) while maintaining an algorithmic latency of 8 ms.  ...  CONCLUSION In this paper, we propose to use asymmetric analysis/synthesis pairs for low-latency DNN-based speech separation.  ... 
arXiv:2106.11794v1 fatcat:4dvqvgds4zhgzeb4olbpvxr5wa

Generating Data To Train Convolutional Neural Networks For Classical Music Source Separation

Marius Miron, Jordi Janer, Emilia Gómez
2017 Proceedings of the SMC Conferences  
Acknowledgments The TITANX used for this research was donated by the Proceedings of the 14th Sound and Music Computing Conference, July 5-8, Espoo, Finland SMC2017-232  ...  Data-driven approaches using deep neural networks involve learning binary or soft masks corresponding to the target sources [7] [8] [9] [10] [11] .  ...  OUTLOOK We proposed a method to generate training data for timbreinformed source separation methods using neural networks, in the context of classical music.  ... 
doi:10.5281/zenodo.1401922 fatcat:z4bq6dksynhodglcsybhby34iu

An Overview of Machine Learning and 5G for People with Disabilities

Mari Carmen Domingo
2021 Sensors  
For this purpose, the proposed 5G network slicing architecture for disabled people is introduced.  ...  Another face recognition system, based on computer vision algorithms (region proposal networks, ATLAS tracking and global, and low-level image descriptors) and deep convolutional neural networks [26]  ...  A light-weight deep convolutional neural network enables location-independent acoustic event recognition.  ... 
doi:10.3390/s21227572 pmid:34833648 pmcid:PMC8622934 fatcat:lwz2pv2drrf6bkoq37iaa45rxi

Deep neural network based speech separation optimizing an objective estimator of intelligibility for low latency applications [article]

Gaurav Naithani, Joonas Nikunen, Lars Bramsløw, Tuomas Virtanen
2018 arXiv   pre-print
Mean square error (MSE) has been the preferred choice as loss function in the current deep neural network (DNN) based speech separation techniques.  ...  We focus on applications where low algorithmic latency (≤ 10 ms) is important.  ...  In recent years, however, purely data driven discriminative approaches like deep neural networks (DNNs) (e.g., in [4, 5] ) have achieved great success.  ... 
arXiv:1807.06899v1 fatcat:cbepe6fxvvdh3ilfnbvh7sm5ny

Design and Integration of alert signal detector and separator for hearing Aid applications

Gautam S Bhat, Nikhil Shankar, Issa M. Panahi
2020 IEEE Access  
The proposed method is based on convolutional neural network (CNN) and convolutional-recurrent neural network (CRNN).  ...  The algorithm is computationally efficient with a low processing delay.  ...  Recently, SE based on deep neural networks (DNN) have been proposed by researchers [18] - [21] .  ... 
doi:10.1109/access.2020.2999546 pmid:32793404 pmcid:PMC7423022 fatcat:ycvntvzxlbbkdjwqrsoeubkujq

Audio Source Separation Using Deep Neural Networks

Pritish Chandna, Jordi Janer, Marius Miron
2016 Zenodo  
This thesis presents a low latency online source separation algorithm based on convolutional neural networks.  ...  Building on ideas from previous research on source separation, we propose an algorithm using a deep neural network with convolutional layers.  ...  While the former focuses on instrument based separation, the later uses neural networks for multi-channel sound source separation.  ... 
doi:10.5281/zenodo.3755620 fatcat:girvxhgbv5cqplktyzmv22gaqu

A Review of Deep Learning Based Methods for Acoustic Scene Classification

Jakob Abeßer
2020 Applied Sciences  
, and for data modeling, i.e., neural network architectures and learning paradigms.  ...  With a focus on deep learning based ASC algorithms, this article summarizes and groups existing approaches for data preparation, i.e., feature representations, feature pre-processing, and data augmentation  ...  Real-time processing requirements often demand for fast model prediction with low latency.  ... 
doi:10.3390/app10062020 fatcat:6uq7xj62o5cprjqd5smmppzhkm
« Previous Showing results 1 — 15 out of 4,848 results