Filters








768 Hits in 7.1 sec

Speech acoustic modeling from raw multichannel waveforms

Yedid Hoshen, Ron J. Weiss, Kevin W. Wilson
2015 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper, we describe a convolutional neural network -deep neural network (CNN-DNN) acoustic model which takes raw multichannel waveforms as input, i.e. without any preceding feature extraction, and  ...  Standard deep neural network-based acoustic models for automatic speech recognition (ASR) rely on hand-engineered input features, typically log-mel filterbank magnitudes.  ...  They pass the waveform into a fully connected layer, which likely requires additional hidden units in order for the network to learn multiple phase shifts of the same filter.  ... 
doi:10.1109/icassp.2015.7178847 dblp:conf/icassp/HoshenWW15 fatcat:sndxkvxqxnaq7engk2hx7d2woy

2020 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 28

2020 IEEE/ACM Transactions on Audio Speech and Language Processing  
., +, TASLP 2020 605-618 Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks.  ...  ., +, TASLP 2020 876-888 Phase estimation Multichannel Speech Enhancement by Raw Waveform-Mapping Using Fully Convolutional Networks.  ...  T Target tracking Multi-Hypothesis Square-Root Cubature Kalman Particle Filter for Speaker Tracking in Noisy and Reverberant Environments. Zhang, Q., +, TASLP 2020 1183 -1197  ... 
doi:10.1109/taslp.2021.3055391 fatcat:7vmstynfqvaprgz6qy3ekinkt4

Deep Learning for Audio Signal Processing

Hendrik Purwins, Bo Li, Tuomas Virtanen, Jan Schluter, Shuo-Yiin Chang, Tara N Sainath
2019 IEEE Journal on Selected Topics in Signal Processing  
The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory  ...  Given the recent surge in developments of deep learning, this article provides a review of the state-of-the-art deep learning techniques for audio signal processing.  ...  In general, source localization requires the use of interchannel information, which can also be learned by a deep neural network with a suitable topology from within-channel features, for example by convolutional  ... 
doi:10.1109/jstsp.2019.2908700 fatcat:oy2qixj2dfe6hns7r7av6fw2wm

Speaker location and microphone spacing invariant acoustic modeling from raw multichannel waveforms

Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani, Andrew
2015 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)  
In this paper, we present an algorithm to do multichannel enhancement jointly with the acoustic model, using a raw waveform convolutional LSTM deep neural network (CLDNN).  ...  Analysis shows that the proposed network learns to be robust to varying angles of arrival for the target speaker, and performs as well as a model that is given oracle knowledge of the true location.  ...  Recently, we have shown that acoustic models trained directly on the single channel raw time-domain waveform [11] using a convolutional, long short-term memory, deep neural network (CLDNN) [12] can  ... 
doi:10.1109/asru.2015.7404770 dblp:conf/asru/SainathWWNBS15 fatcat:6evifjtjdfea3mx3rl2mjdh7wy

Recent progresses in deep learning based acoustic models

Dong Yu, Jinyu Li
2017 IEEE/CAA Journal of Automatica Sinica  
We then describe acoustic models that are optimized end-to-end with emphasis on feature representations learned jointly with rest of the system, the connectionist temporal classification (CTC) criterion  ...  In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.  ...  In the first layer, multiple time convolution filters are used to map the raw waveforms from multiple microphones into a single time-frequency representation [56] .  ... 
doi:10.1109/jas.2017.7510508 fatcat:zcffvbg75bhllcekqghkmwidsy

Raw Multichannel Processing Using Deep Neural Networks [chapter]

Tara N. Sainath, Ron J. Weiss, Kevin W. Wilson, Arun Narayanan, Michiel Bacchiani, Bo Li, Ehsan Variani, Izhak Shafran, Andrew Senior, Kean Chin, Ananya Misra, Chanwoo Kim
2017 New Era for Robust Speech Recognition  
In this chapter, we perform multichannel enhancement jointly with acoustic modeling in a deep neural network framework.  ...  We introduce a neural network architecture which performs multichannel filtering in the first layer of the network and show that this network learns to be robust to varying target speaker direction of  ...  The output of this spectral filtering layer is passed to an acoustic model, such as a convolutional long short-term memory, deep neural network (CLDNN) acoustic model [29] .  ... 
doi:10.1007/978-3-319-64680-0_5 fatcat:22k7btluzvalnf7sbepu5w5pae

A Review of Deep Learning Based Methods for Acoustic Scene Classification

Jakob Abeßer
2020 Applied Sciences  
, and for data modeling, i.e., neural network architectures and learning paradigms.  ...  With a focus on deep learning based ASC algorithms, this article summarizes and groups existing approaches for data preparation, i.e., feature representations, feature pre-processing, and data augmentation  ...  Conflicts of Interest: The author declares no conflict of interest.  ... 
doi:10.3390/app10062020 fatcat:6uq7xj62o5cprjqd5smmppzhkm

Recent Progresses in Deep Learning based Acoustic Models (Updated) [article]

Dong Yu, Jinyu Li
2018 arXiv   pre-print
We then describe acoustic models that are optimized end-to-end with emphasis on feature representations learned jointly with rest of the system, the connectionist temporal classification (CTC) criterion  ...  In this paper, we summarize recent progresses made in deep learning based acoustic models and the motivation and insights behind the surveyed techniques.  ...  In the first layer, multiple time convolution filters are used to map the raw waveforms from multiple microphones into a single time-frequency representation [58] .  ... 
arXiv:1804.09298v2 fatcat:yfxzxu6qanbndcnmt3loikqeym

Audio-Visual Model Distillation Using Acoustic Images

Andres F. Perez, Valentina Sanguineti, Pietro Morerio, Vittorio Murino
2020 2020 IEEE Winter Conference on Applications of Computer Vision (WACV)  
In this paper, we investigate how to learn rich and robust feature representations for audio classification from visual data and acoustic images, a novel audio data modality.  ...  Using this richer information, we train audio deep learning models in a teacher-student fashion. In particular, we distill knowledge into audio networks from both visual and acoustic image teachers.  ...  Nevertheless, none of the past works tried to exploit spatially localized acoustic data to assess the potentialities of such richer information source.  ... 
doi:10.1109/wacv45572.2020.9093307 dblp:conf/wacv/PerezSMM20 fatcat:cf3cdewcwndbrddttjysuo3jhe

A Survey of Sound Source Localization with Deep Learning Methods [article]

Pierre-Amaury Grumiaux, Srđan Kitić, Laurent Girin, Alexandre Guérin
2021 arXiv   pre-print
This article is a survey on deep learning methods for single and multiple sound source localization.  ...  This way, an interested reader can easily comprehend the vast panorama of the deep learning-based sound source localization methods.  ...  However, it is interesting to mention that in some studies [217] , [218] , the visual inspection of the learned weights of the input layers of some end-to-end (waveform-based) neural networks has revealed  ... 
arXiv:2109.03465v2 fatcat:4zsolfkfsfgavnykr72svnjkjq

Deep learning approaches for neural decoding: from CNNs to LSTMs and spikes to fMRI [article]

Jesse A. Livezey, Joshua I. Glaser
2020 arXiv   pre-print
The success of deep networks in other domains has led to a new wave of applications in neuroscience. In this article, we review deep learning approaches to neural decoding.  ...  for complex decoding targets like acoustic speech or images.  ...  Acknowledgements We would like to thank Ella Batty and Charles Frye for very helpful comments on this manuscript.  ... 
arXiv:2005.09687v1 fatcat:grboww5ptvah5npbl3xeehbady

Audio-Visual Model Distillation Using Acoustic Images [article]

Andrés F. Pérez, Valentina Sanguineti, Pietro Morerio, Vittorio Murino
2020 arXiv   pre-print
In this paper, we investigate how to learn rich and robust feature representations for audio classification from visual data and acoustic images, a novel audio data modality.  ...  Using this richer information, we train audio deep learning models in a teacher-student fashion. In particular, we distill knowledge into audio networks from both visual and acoustic image teachers.  ...  Nevertheless, none of the past works tried to exploit spatially localized acoustic data to assess the potentialities of such richer information source.  ... 
arXiv:1904.07933v2 fatcat:wdxa3pcc75cfxdmzgtqm4szkpi

End-to-End Multi-Look Keyword Spotting

Meng Yu, Xuan Ji, Bo Wu, Dan Su, Dong Yu
2020 Interspeech 2020  
In this paper, we propose a multilook neural network modeling for speech enhancement which simultaneously steers to listen to multiple sampled look directions.  ...  The multi-look enhancement is then jointly trained with KWS to form an end-to-end KWS model which integrates the enhanced signals from multiple look directions and leverages an attention mechanism to dynamically  ...  Neural network based multilook filtering in [22] [23] [24] implicitly learns filters for enhancing sources from different spatial look directions and passes all the filtered signals to an acoustic model  ... 
doi:10.21437/interspeech.2020-1521 dblp:conf/interspeech/0003JW0020 fatcat:sqskgn2e7nfjrbglq5a6ca5voi

End-to-End Multi-Look Keyword Spotting [article]

Meng Yu, Xuan Ji, Bo Wu, Dan Su, Dong Yu
2020 arXiv   pre-print
In this paper, we propose a multi-look neural network modeling for speech enhancement which simultaneously steers to listen to multiple sampled look directions.  ...  The multi-look enhancement is then jointly trained with KWS to form an end-to-end KWS model which integrates the enhanced signals from multiple look directions and leverages an attention mechanism to dynamically  ...  Neural network based multilook filtering in [22] [23] [24] implicitly learns filters for enhancing sources from different spatial look directions and passes all the filtered signals to an acoustic model  ... 
arXiv:2005.10386v1 fatcat:phmksvpmrrf7jeenir5t2u4v7y

Learning Multiscale Features Directly from Waveforms

Zhenyao Zhu, Jesse H. Engel, Awni Hannun
2016 Interspeech 2016  
However, true end-toend learning, where features are learned directly from waveforms, has only recently reached the performance of handtailored representations based on the Fourier transform.  ...  In this paper, we detail an approach to use convolutional filters to push past the inherent tradeoff of temporal and frequency resolution that exists for spectral representations.  ...  Herein, we explore jointly learning filter banks at multiple different scales on raw wave-Figure 1 : Diagram of multiscale convolutions for learning directly from waveforms.  ... 
doi:10.21437/interspeech.2016-256 dblp:conf/interspeech/ZhuEH16 fatcat:nl5jbumpvbfctonygcblyslt7q
« Previous Showing results 1 — 15 out of 768 results