41 Hits in 3.9 sec

Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition

Deblin Bagchi, Michael I. Mandel, Zhongqiu Wang, Yanzhang He, Andrew Plummer, Eric Fosler-Lussier
2015 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)  
, followed by a DNN-based front end spectral mapper that predicts clean filterbank features.  ...  In order to address these issues, we explore the effectiveness of first applying a model-based source separation mask to the output of a beamformer that combines the source signals recorded by each microphone  ...  Beamforming MVDR | BeamformIt MESSL Estimated Source Separation Mask Spectral Mapping DNN-based Denoising Backend ASR Fig. 2.  ... 
doi:10.1109/asru.2015.7404836 dblp:conf/asru/BagchiMWHPF15 fatcat:3grj3nl4vbad7lvewap3wes5ky

Multichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions

Michael I. Mandel, Jon Barker
2016 Interspeech 2016  
Here it is used for the first time to drive minimum variance distortionless response (MVDR) beamforming in several ways.  ...  This approach, known as Model-based EM Source Separation and Localization (MESSL), clusters spectrogram points based on the relative differences in phase and level between pairs of microphones.  ...  While using MESSL's outputs for spatial covariance estimates of the noise and for mask-based post-filtering improved ASR performance compared to a standard baseline, their use for estimating the target  ... 
doi:10.21437/interspeech.2016-1275 dblp:conf/interspeech/MandelB16 fatcat:g4mwvdsx5rbwzn4jikkwpsnore

Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks [article]

Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, Michael Mandel
2020 arXiv   pre-print
By using LSTMs to enhance spatial clustering based time-frequency masks, we achieve both the signal modeling performance of multiple single-channel LSTM-DNN speech enhancers and the signal separation performance  ...  and generality of multi-channel spatial clustering.  ...  A comparison of these combination methods is given in Table 3 Then we use this final mask to estimate noise spatial covariances and perform mask-driven MVDR beamforming.  ... 
arXiv:2012.01576v1 fatcat:75oq625fxjg5pg7cipvisfahbi

Deep Learning Based Binaural Speech Separation in Reverberant Environments

Xueliang Zhang, DeLiang Wang
2017 IEEE/ACM Transactions on Audio Speech and Language Processing  
With binaural inputs, we first apply a fixed beamformer and then extract several spectral features. A new spatial feature is proposed and extracted to complement the spectral features.  ...  Index Terms Binaural speech separation; computational auditory scene analysis (CASA); room reverberation; deep neural network (DNN); Beamforming Personal use is permitted, but republication/redistribution  ...  Acknowledgments The authors would like to thank Yuxuan Wang for providing his DNN code, Yi Jiang for assistance in using his SBC code and the Ohio Supercomputing Center for providing computing resources  ... 
doi:10.1109/taslp.2017.2687104 pmid:29057291 pmcid:PMC5646682 fatcat:hin6pqgdbfdtjdv6sutlqf4sym

Supervised Speech Separation Based on Deep Learning: An Overview [article]

DeLiang Wang, Jitong Chen
2018 arXiv   pre-print
This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years.  ...  We first introduce the background of speech separation and the formulation of supervised separation.  ...  We thank Masood Delfarah for help in manuscript preparation and Jun Du, Yu Tsao, Yuxuan Wang, Yong Xu, and Xueliang Zhang for helpful comments on an earlier version.  ... 
arXiv:1708.07524v2 fatcat:bvaa2yuppffppnta2lfpkk4v4m

Integration of neural networks and probabilistic spatial models for acoustic blind source separation

Lukas Drude, Reinhold Haeb-Umbach
2019 IEEE Journal on Selected Topics in Signal Processing  
We formulate a generic framework for blind source separation (BSS), which allows integrating data-driven spectrotemporal methods, such as deep clustering and deep attractor networks, with physically motivated  ...  The integrated model exploits the complementary strengths of the two approaches to BSS: the strong modeling power of neural networks, which, however, is based on supervised learning, and the ease of unsupervised  ...  More recently, [39] proposed the integration of a DNN-based mask estimator and a complex angular central Gaussian mixture model (cACGMM) to extract a single source.  ... 
doi:10.1109/jstsp.2019.2912565 fatcat:brneboukgneg3npnuqx4phgsom

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation [article]

Zhong-Qiu Wang and Peidong Wang and DeLiang Wang
2021 arXiv   pre-print
We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level  ...  We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions.  ...  masking based MVDR for separation.  ... 
arXiv:2010.01703v2 fatcat:huvvxizr2jhjlhugtwk4kr7kze

Localization Based Sequential Grouping for Continuous Speech Separation [article]

Zhong-Qiu Wang, DeLiang Wang
2021 arXiv   pre-print
results across blocks based on the DOA estimates.  ...  This study investigates robust speaker localization for con-tinuous speech separation and speaker diarization, where we use speaker directions to group non-contiguous segments of the same speaker.  ...  The target estimates are then utilized to compute spatial covariance matrices for MVDR beamforming.  ... 
arXiv:2107.06853v1 fatcat:3strn7weijbhndqgstdpoxn4wu

Convolutive Prediction for Reverberant Speech Separation [article]

Zhong-Qiu Wang and Gordon Wichern and Jonathan Le Roux
2021 arXiv   pre-print
The beamforming and dereverberation results are used as extra features for a second DNN to perform better separation and dereverberation. State-of-the-art results are obtained on the SMS-WSJ corpus.  ...  The key idea is to first use a deep neural network (DNN) to estimate the direct-path signal of each speaker, and then identify delayed and decayed copies of the estimated direct-path signal.  ...  The major contributions of this study are the introduction of a novel dereverberation module in between the two DNNs, and its integration with beamforming.  ... 
arXiv:2108.07194v1 fatcat:4yd3hnpr6vb77dvpq2di2a6rci

Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking [article]

Tsubasa Ochiai, Marc Delcroix, Tomohiro Nakatani, Shoko Araki
2022 arXiv   pre-print
development of mask-based beamformers robust to source movements.  ...  Computing the beamforming filter requires estimating spatial covariance matrices (SCMs) of the source and noise signals. Time-frequency masks are often used to compute these SCMs.  ...  There are currently two main research directions toward estimating the time-frequency masks for mask-based beamformers, i.e., spatial clustering [11] and NNs [9] , [10] .  ... 
arXiv:2205.03568v1 fatcat:vjuxry7azfd4zbaxpaqixukjau

A multichannel learning-based approach for sound source separation in reverberant environments

You-Siang Chen, Zi-Jie Lin, Mingsian R. Bai
2021 EURASIP Journal on Audio, Speech, and Music Processing  
In the first stage, time-dilated convolutional blocks are trained to estimate the array weights for beamforming the multichannel microphone signals.  ...  In the second stage, a U-net model is concatenated to the beamforming network to serve as a non-linear mapping filter for joint separation and dereverberation.  ...  Mingsian Bai for his three-month visit to the LMS, FAU, Erlangen-Nuremberg, which made this research work possible.  ... 
doi:10.1186/s13636-021-00227-2 fatcat:3xpqaktlzbefjho4iln3dtsbne

Exploiting CNNs for Improving Acoustic Source Localization in Noisy and Reverberant Conditions

Daniele Salvati, Carlo Drioli, Gian Luca Foresti
2018 IEEE Transactions on Emerging Topics in Computational Intelligence  
We investigate the direction of arrival (DOA) estimation problem in noisy and reverberant conditions using an uniform linear array (ULA).  ...  Experiments with both simulated and real acoustic data demonstrate the superior localization performance of the proposed SRP beamformer with respect to other state-ofthe-art techniques.  ...  This work was partially supported by the "Proactive Vision for advanced UAV systems for the protection of mobile units, control of territory and environmental prevention (SUPReME)" FVG L.R. 20/2015 project  ... 
doi:10.1109/tetci.2017.2775237 dblp:journals/tetci/SalvatiDF18 fatcat:noab3bn4izfwhbcmroqd4sqefa

Multichannel End-to-end Speech Recognition [article]

Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey
2017 arXiv   pre-print
Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.  ...  In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network.  ...  al. 2015 uses a clustering technique to perform mask estimation rather than the neural network-based techniques, but it uses the same MVDR formulation for filter estimation.  ... 
arXiv:1703.04783v1 fatcat:zjwcmk4d35ddtpo7nqutyczdse

A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation

Sharon Gannot, Emmanuel Vincent, Shmulik Markovich-Golan, Alexey Ozerov
2017 IEEE/ACM Transactions on Audio Speech and Language Processing  
In addition, they are crucial pre-processing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones.  ...  design criterion, c) the parameter estimation algorithm, and d) optional postfiltering.  ...  Intra-and inter-node location features are integrated in a clustering-based scheme for speech separation in [355] .  ... 
doi:10.1109/taslp.2016.2647702 fatcat:ltfmmoguxngk5jrvzy7azzufae

DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models

Alexander Bohlender, Lucas Van Severen, Jonathan Sterckx, Nilesh Madhu
2022 EURASIP Journal on Audio, Speech, and Music Processing  
AbstractBy means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components.  ...  In this paper, we therefore consider three techniques to overcome these limitations using direction of arrival (DOA) estimates. First, we propose an initialization with simple DOA-based masks.  ...  More recently, particularly the use of spatial clustering in conjunction with DNN-based methods for initial mask estimation has received a lot of attention.  ... 
doi:10.1186/s13636-022-00246-7 fatcat:eif2twjowvdvbiszuc6wesftz4
« Previous Showing results 1 — 15 out of 41 results