A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Improved Source Counting and Separation for Monaural Mixture
[article]
2020
arXiv
pre-print
embedding vectors which are then clustered with deep attractor network to modify the encoded feature. ...
Single-channel speech separation in time domain and frequency domain has been widely studied for voice-driven applications over the past few years. ...
Separator with Unknown Number of Speakers The separator module contains three parts: an embedding network, an attractor network for mask estimation, and a source counting part. ...
arXiv:2004.00175v1
fatcat:fki5xqlbejgopf6hex5e7i7ude
Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition
2019
Interspeech 2019
separation outputs using a bi-directional long short term memory network trained to minimize the recognition loss directly. ...
With speaker tracing the WERR can be further promoted to 12.4% to 29.0%. ...
Its neural network extensions, including the deep attractor network (DANet) [24] and the deep extractor network (DENet) [20] , have been proven effective for single-channel speech separation tasks. ...
doi:10.21437/interspeech.2019-1626
dblp:conf/interspeech/LamWLMSY19
fatcat:6y4anxgldrc7xoloaj44p6n5nu
Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment
[article]
2020
arXiv
pre-print
With the developing of deep learning approaches, much progress has been performed on monaural multi-speaker speech separation. ...
using deep dilated temporal convolutional networks (TCN). ...
In [13] , Deep Attractor Network (DANet) produces attractors in deep embedding space to achieves label assignment. In [14] , a time-domain audio separation network (TasNet) is proposed. ...
arXiv:2004.06332v2
fatcat:eujanoddxrg3na5zd6psadfise
Speaker-Aware Monaural Speech Separation
2020
Interspeech 2020
Inspired by the success of speaker-specific speech extraction, in this paper, we propose a novel speaker-aware monaural speech separation model by utilizing a mask inferring neural network with the help ...
However, existing studies have not well utilized the identity context of a speaker for the inference of masks. In this paper, we propose a novel speaker-aware monaural speech separation model. ...
Dilated convolutions on both temporal and frequency domains with Gated Residual Network (GRN) were also investigated for speech separation [24] . ...
doi:10.21437/interspeech.2020-2483
dblp:conf/interspeech/XuHXT020
fatcat:zg4xuzossjerbdpqx4e2dwedua
Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks
2018
Interspeech 2018
To improve the speech recognition accuracy under the multitalker scenario, we propose a novel model architecture that incorporates the attention mechanism and gated convolutional network (GCN) into our ...
Finally the predictor generates the senone posteriors for all speaker sources independently with the knowledge from the context vectors. ...
Another model called deep attractor network (DANet) [15] learns a high-dimensional embedding of the speech spectrum and clusters embeddings with attractor points. ...
doi:10.21437/interspeech.2018-1547
dblp:conf/interspeech/ChangQ018
fatcat:37d35vvzhrccrn2isnzkfjphlm
Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
[article]
2020
arXiv
pre-print
The dominant speech separation models are based on complex recurrent or convolution neural network that model speech sequences indirectly conditioning on context, such as passing information through many ...
In this paper, we propose a dual-path transformer network (DPTNet) for end-to-end speech separation, which introduces direct context-awareness in the modeling for speech sequences. ...
In general, deep learning techniques for monaural speech separation can be divided into two categories: time-frequency (T-F) domain methods and end-to-end time-domain approaches. ...
arXiv:2007.13975v3
fatcat:vkp6il3r5neidftcco3lvojd5y
Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation
2020
Interspeech 2020
We introduces monaural speech separation with DPTNet in Section ...
Index Terms: direct context-aware modeling, transformer, dual-path network, speech separation, deep learning 1. ...
In general, deep learning techniques for monaural speech separation can be divided into two categories: time-frequency (T-F) domain methods and end-to-end time-domain approaches. ...
doi:10.21437/interspeech.2020-2205
dblp:conf/interspeech/ChenML20a
fatcat:uxnieqk3qbbhrkg5rd3bjlwnza
Monaural Audio Speaker Separation with Source Contrastive Estimation
[article]
2017
arXiv
pre-print
Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers. ...
Our approach is similar to recent deep neural network clustering and permutation-invariant training research; we use weighted spectral features and masks to augment individual speaker frequencies while ...
DC is related to another approach, deep attractor networks (DA) [12] . ...
arXiv:1705.04662v1
fatcat:xb5au2ofknambjmp5kxrkbkhne
Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method
[article]
2020
arXiv
pre-print
In this paper, we propose an end-to-end post-filter method with deep attention fusion features for monaural speaker-independent speech separation. ...
It is a fully convolutional speech separation network and uses the waveform as the input features. ...
THE PROPOSED SPEECH SEPARATION METHOD In this paper, we propose an end-to-end post-filter (E2EPF) with deep attention fusion features for monaural speaker-independent speech separation. ...
arXiv:2003.07544v1
fatcat:crlw2wh3vzfwhkpkx6gpfpreh4
Supervised Speech Separation Based on Deep Learning: An Overview
[article]
2018
arXiv
pre-print
Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation ...
In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance. ...
We thank Masood Delfarah for help in manuscript preparation and Jun Du, Yu Tsao, Yuxuan Wang, Yong Xu, and Xueliang Zhang for helpful comments on an earlier version. ...
arXiv:1708.07524v2
fatcat:bvaa2yuppffppnta2lfpkk4v4m
Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation
[article]
2019
arXiv
pre-print
We address talker-independent monaural speaker separation from the perspectives of deep learning and computational auditory scene analysis (CASA). ...
The proposed deep CASA approach optimizes frame-level separation and speaker tracking in turn, and produces excellent results for both objectives. ...
Monaural Speaker Separation The goal of monaural speaker separation is to estimate C independent speech signals x c (n), c = 1, ..., C, from a single-channel recording of speech mixture y(n), where y(n ...
arXiv:1904.11148v1
fatcat:mthvxjmy3fhanhj32f77xbb6xu
A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet
[article]
2020
arXiv
pre-print
In this work, we investigate if the learned encoder of the end-to-end convolutional time domain audio separation network (Conv-TasNet) is the key to its recent success, or if the encoder can just as well ...
In contrast to a common gammatone filterbank, our filters are restricted to 2 ms length to allow for low-latency processing. ...
In contrast, deep learning approaches such as Deep Clustering [1] , Permutation Invariant Training (PIT) [2, 3] , Deep Attractor Networks [4] and Chimera++ [5] tackle the separation problem by transforming ...
arXiv:1910.11615v2
fatcat:7fteorqbxzcdxk2gf2sactb5me
Overlapped speech recognition from a jointly learned multi-channel neural speech extraction and representation
[article]
2019
arXiv
pre-print
First, based on a multi-channel convolutional TasNet with STFT kernel, we unify the multi-channel target speech enhancement front-end network and a convolutional, long short-term memory and fully connected ...
We propose an end-to-end joint optimization framework of a multi-channel neural speech extraction and deep acoustic model without mel-filterbank (FBANK) extraction for overlapped speech recognition. ...
Lam for their code assist. ...
arXiv:1910.13825v1
fatcat:6bjdns3zfzajficgqoeg3722rq
Recent progresses in deep learning based acoustic models
2017
IEEE/CAA Journal of Automatica Sinica
We further illustrate robustness issues in speech recognition systems, and discuss acoustic model adaptation, speech enhancement and separation, and robust training strategies. ...
We first discuss acoustic models that can effectively exploit variable-length contextual information, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and their various combination ...
Recently, researchers have developed many deep learning techniques for speech enhancement and separation. ...
doi:10.1109/jas.2017.7510508
fatcat:zcffvbg75bhllcekqghkmwidsy
Recent Progresses in Deep Learning based Acoustic Models (Updated)
[article]
2018
arXiv
pre-print
We further illustrate robustness issues in speech recognition systems, and discuss acoustic model adaptation, speech enhancement and separation, and robust training strategies. ...
We first discuss acoustic models that can effectively exploit variable-length contextual information, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and their various combination ...
Recently, researchers have developed many deep learning techniques for speech enhancement and separation. ...
arXiv:1804.09298v2
fatcat:yfxzxu6qanbndcnmt3loikqeym
« Previous
Showing results 1 — 15 out of 59 results