A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2017; you can also visit the original URL.
The file type is application/pdf
.
Filters
Combining spectral feature mapping and multi-channel model-based source separation for noise-robust automatic speech recognition
2015
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
, followed by a DNN-based front end spectral mapper that predicts clean filterbank features. ...
In order to address these issues, we explore the effectiveness of first applying a model-based source separation mask to the output of a beamformer that combines the source signals recorded by each microphone ...
Beamforming
MVDR | BeamformIt
MESSL
Estimated Source
Separation Mask
Spectral Mapping
DNN-based
Denoising
Backend ASR
Fig. 2. ...
doi:10.1109/asru.2015.7404836
dblp:conf/asru/BagchiMWHPF15
fatcat:3grj3nl4vbad7lvewap3wes5ky
Multichannel Spatial Clustering for Robust Far-Field Automatic Speech Recognition in Mismatched Conditions
2016
Interspeech 2016
Here it is used for the first time to drive minimum variance distortionless response (MVDR) beamforming in several ways. ...
This approach, known as Model-based EM Source Separation and Localization (MESSL), clusters spectrogram points based on the relative differences in phase and level between pairs of microphones. ...
While using MESSL's outputs for spatial covariance estimates of the noise and for mask-based post-filtering improved ASR performance compared to a standard baseline, their use for estimating the target ...
doi:10.21437/interspeech.2016-1275
dblp:conf/interspeech/MandelB16
fatcat:g4mwvdsx5rbwzn4jikkwpsnore
Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks
[article]
2020
arXiv
pre-print
By using LSTMs to enhance spatial clustering based time-frequency masks, we achieve both the signal modeling performance of multiple single-channel LSTM-DNN speech enhancers and the signal separation performance ...
and generality of multi-channel spatial clustering. ...
A comparison of these combination methods is given in Table 3 Then we use this final mask to estimate noise spatial covariances and perform mask-driven MVDR beamforming. ...
arXiv:2012.01576v1
fatcat:75oq625fxjg5pg7cipvisfahbi
Deep Learning Based Binaural Speech Separation in Reverberant Environments
2017
IEEE/ACM Transactions on Audio Speech and Language Processing
With binaural inputs, we first apply a fixed beamformer and then extract several spectral features. A new spatial feature is proposed and extracted to complement the spectral features. ...
Index Terms Binaural speech separation; computational auditory scene analysis (CASA); room reverberation; deep neural network (DNN); Beamforming Personal use is permitted, but republication/redistribution ...
Acknowledgments The authors would like to thank Yuxuan Wang for providing his DNN code, Yi Jiang for assistance in using his SBC code and the Ohio Supercomputing Center for providing computing resources ...
doi:10.1109/taslp.2017.2687104
pmid:29057291
pmcid:PMC5646682
fatcat:hin6pqgdbfdtjdv6sutlqf4sym
Supervised Speech Separation Based on Deep Learning: An Overview
[article]
2018
arXiv
pre-print
This article provides a comprehensive overview of the research on deep learning based supervised speech separation in the last several years. ...
We first introduce the background of speech separation and the formulation of supervised separation. ...
We thank Masood Delfarah for help in manuscript preparation and Jun Du, Yu Tsao, Yuxuan Wang, Yong Xu, and Xueliang Zhang for helpful comments on an earlier version. ...
arXiv:1708.07524v2
fatcat:bvaa2yuppffppnta2lfpkk4v4m
Integration of neural networks and probabilistic spatial models for acoustic blind source separation
2019
IEEE Journal on Selected Topics in Signal Processing
We formulate a generic framework for blind source separation (BSS), which allows integrating data-driven spectrotemporal methods, such as deep clustering and deep attractor networks, with physically motivated ...
The integrated model exploits the complementary strengths of the two approaches to BSS: the strong modeling power of neural networks, which, however, is based on supervised learning, and the ease of unsupervised ...
More recently, [39] proposed the integration of a DNN-based mask estimator and a complex angular central Gaussian mixture model (cACGMM) to extract a single source. ...
doi:10.1109/jstsp.2019.2912565
fatcat:brneboukgneg3npnuqx4phgsom
Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation
[article]
2021
arXiv
pre-print
We then integrate multi-microphone complex spectral mapping with minimum variance distortionless response (MVDR) beamforming and post-filtering to further improve separation, and combine it with frame-level ...
We propose multi-microphone complex spectral mapping, a simple way of applying deep learning for time-varying non-linear beamforming, for speaker separation in reverberant conditions. ...
masking based MVDR for separation. ...
arXiv:2010.01703v2
fatcat:huvvxizr2jhjlhugtwk4kr7kze
Localization Based Sequential Grouping for Continuous Speech Separation
[article]
2021
arXiv
pre-print
results across blocks based on the DOA estimates. ...
This study investigates robust speaker localization for con-tinuous speech separation and speaker diarization, where we use speaker directions to group non-contiguous segments of the same speaker. ...
The target estimates are then utilized to compute spatial covariance matrices for MVDR beamforming. ...
arXiv:2107.06853v1
fatcat:3strn7weijbhndqgstdpoxn4wu
Convolutive Prediction for Reverberant Speech Separation
[article]
2021
arXiv
pre-print
The beamforming and dereverberation results are used as extra features for a second DNN to perform better separation and dereverberation. State-of-the-art results are obtained on the SMS-WSJ corpus. ...
The key idea is to first use a deep neural network (DNN) to estimate the direct-path signal of each speaker, and then identify delayed and decayed copies of the estimated direct-path signal. ...
The major contributions of this study are the introduction of a novel dereverberation module in between the two DNNs, and its integration with beamforming. ...
arXiv:2108.07194v1
fatcat:4yd3hnpr6vb77dvpq2di2a6rci
Mask-based Neural Beamforming for Moving Speakers with Self-Attention-based Tracking
[article]
2022
arXiv
pre-print
development of mask-based beamformers robust to source movements. ...
Computing the beamforming filter requires estimating spatial covariance matrices (SCMs) of the source and noise signals. Time-frequency masks are often used to compute these SCMs. ...
There are currently two main research directions toward estimating the time-frequency masks for mask-based beamformers, i.e., spatial clustering [11] and NNs [9] , [10] . ...
arXiv:2205.03568v1
fatcat:vjuxry7azfd4zbaxpaqixukjau
A multichannel learning-based approach for sound source separation in reverberant environments
2021
EURASIP Journal on Audio, Speech, and Music Processing
In the first stage, time-dilated convolutional blocks are trained to estimate the array weights for beamforming the multichannel microphone signals. ...
In the second stage, a U-net model is concatenated to the beamforming network to serve as a non-linear mapping filter for joint separation and dereverberation. ...
Mingsian Bai for his three-month visit to the LMS, FAU, Erlangen-Nuremberg, which made this research work possible. ...
doi:10.1186/s13636-021-00227-2
fatcat:3xpqaktlzbefjho4iln3dtsbne
Exploiting CNNs for Improving Acoustic Source Localization in Noisy and Reverberant Conditions
2018
IEEE Transactions on Emerging Topics in Computational Intelligence
We investigate the direction of arrival (DOA) estimation problem in noisy and reverberant conditions using an uniform linear array (ULA). ...
Experiments with both simulated and real acoustic data demonstrate the superior localization performance of the proposed SRP beamformer with respect to other state-ofthe-art techniques. ...
This work was partially supported by the "Proactive Vision for advanced UAV systems for the protection of mobile units, control of territory and environmental prevention (SUPReME)" FVG L.R. 20/2015 project ...
doi:10.1109/tetci.2017.2775237
dblp:journals/tetci/SalvatiDF18
fatcat:noab3bn4izfwhbcmroqd4sqefa
Multichannel End-to-end Speech Recognition
[article]
2017
arXiv
pre-print
Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer. ...
In this paper we extend the end-to-end framework to encompass microphone array signal processing for noise suppression and speech enhancement within the acoustic encoding network. ...
al. 2015 uses a clustering technique to perform mask estimation rather than the neural network-based techniques, but it uses the same MVDR formulation for filter estimation. ...
arXiv:1703.04783v1
fatcat:zjwcmk4d35ddtpo7nqutyczdse
A Consolidated Perspective on Multimicrophone Speech Enhancement and Source Separation
2017
IEEE/ACM Transactions on Audio Speech and Language Processing
In addition, they are crucial pre-processing steps for noise-robust automatic speech and speaker recognition. Many devices now have two to eight microphones. ...
design criterion, c) the parameter estimation algorithm, and d) optional postfiltering. ...
Intra-and inter-node location features are integrated in a clustering-based scheme for speech separation in [355] . ...
doi:10.1109/taslp.2016.2647702
fatcat:ltfmmoguxngk5jrvzy7azzufae
DOA-guided source separation with direction-based initialization and time annotations using complex angular central Gaussian mixture models
2022
EURASIP Journal on Audio, Speech, and Music Processing
AbstractBy means of spatial clustering and time-frequency masking, a mixture of multiple speakers and noise can be separated into the underlying signal components. ...
In this paper, we therefore consider three techniques to overcome these limitations using direction of arrival (DOA) estimates. First, we propose an initialization with simple DOA-based masks. ...
More recently, particularly the use of spatial clustering in conjunction with DNN-based methods for initial mask estimation has received a lot of attention. ...
doi:10.1186/s13636-022-00246-7
fatcat:eif2twjowvdvbiszuc6wesftz4
« Previous
Showing results 1 — 15 out of 41 results