Filters








59 Hits in 3.2 sec

Improved Source Counting and Separation for Monaural Mixture [article]

Yiming Xiao, Haijian Zhang
2020 arXiv   pre-print
embedding vectors which are then clustered with deep attractor network to modify the encoded feature.  ...  Single-channel speech separation in time domain and frequency domain has been widely studied for voice-driven applications over the past few years.  ...  Separator with Unknown Number of Speakers The separator module contains three parts: an embedding network, an attractor network for mask estimation, and a source counting part.  ... 
arXiv:2004.00175v1 fatcat:fki5xqlbejgopf6hex5e7i7ude

Extract, Adapt and Recognize: An End-to-End Neural Network for Corrupted Monaural Speech Recognition

Max W.Y. Lam, Jun Wang, Xunying Liu, Helen Meng, Dan Su, Dong Yu
2019 Interspeech 2019  
separation outputs using a bi-directional long short term memory network trained to minimize the recognition loss directly.  ...  With speaker tracing the WERR can be further promoted to 12.4% to 29.0%.  ...  Its neural network extensions, including the deep attractor network (DANet) [24] and the deep extractor network (DENet) [20] , have been proven effective for single-channel speech separation tasks.  ... 
doi:10.21437/interspeech.2019-1626 dblp:conf/interspeech/LamWLMSY19 fatcat:6y4anxgldrc7xoloaj44p6n5nu

Two-stage model and optimal SI-SNR for monaural multi-speaker speech separation in noisy environment [article]

Chao Ma, Dongmei Li, Xupeng Jia
2020 arXiv   pre-print
With the developing of deep learning approaches, much progress has been performed on monaural multi-speaker speech separation.  ...  using deep dilated temporal convolutional networks (TCN).  ...  In [13] , Deep Attractor Network (DANet) produces attractors in deep embedding space to achieves label assignment. In [14] , a time-domain audio separation network (TasNet) is proposed.  ... 
arXiv:2004.06332v2 fatcat:eujanoddxrg3na5zd6psadfise

Speaker-Aware Monaural Speech Separation

Jiahao Xu, Kun Hu, Chang Xu, Duc Chung Tran, Zhiyong Wang
2020 Interspeech 2020  
Inspired by the success of speaker-specific speech extraction, in this paper, we propose a novel speaker-aware monaural speech separation model by utilizing a mask inferring neural network with the help  ...  However, existing studies have not well utilized the identity context of a speaker for the inference of masks. In this paper, we propose a novel speaker-aware monaural speech separation model.  ...  Dilated convolutions on both temporal and frequency domains with Gated Residual Network (GRN) were also investigated for speech separation [24] .  ... 
doi:10.21437/interspeech.2020-2483 dblp:conf/interspeech/XuHXT020 fatcat:zg4xuzossjerbdpqx4e2dwedua

Monaural Multi-Talker Speech Recognition with Attention Mechanism and Gated Convolutional Networks

Xuankai Chang, Yanmin Qian, Dong Yu
2018 Interspeech 2018  
To improve the speech recognition accuracy under the multitalker scenario, we propose a novel model architecture that incorporates the attention mechanism and gated convolutional network (GCN) into our  ...  Finally the predictor generates the senone posteriors for all speaker sources independently with the knowledge from the context vectors.  ...  Another model called deep attractor network (DANet) [15] learns a high-dimensional embedding of the speech spectrum and clusters embeddings with attractor points.  ... 
doi:10.21437/interspeech.2018-1547 dblp:conf/interspeech/ChangQ018 fatcat:37d35vvzhrccrn2isnzkfjphlm

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation [article]

Jingjing Chen, Qirong Mao, Dong Liu
2020 arXiv   pre-print
The dominant speech separation models are based on complex recurrent or convolution neural network that model speech sequences indirectly conditioning on context, such as passing information through many  ...  In this paper, we propose a dual-path transformer network (DPTNet) for end-to-end speech separation, which introduces direct context-awareness in the modeling for speech sequences.  ...  In general, deep learning techniques for monaural speech separation can be divided into two categories: time-frequency (T-F) domain methods and end-to-end time-domain approaches.  ... 
arXiv:2007.13975v3 fatcat:vkp6il3r5neidftcco3lvojd5y

Dual-Path Transformer Network: Direct Context-Aware Modeling for End-to-End Monaural Speech Separation

Jingjing Chen, Qirong Mao, Dong Liu
2020 Interspeech 2020  
We introduces monaural speech separation with DPTNet in Section  ...  Index Terms: direct context-aware modeling, transformer, dual-path network, speech separation, deep learning 1.  ...  In general, deep learning techniques for monaural speech separation can be divided into two categories: time-frequency (T-F) domain methods and end-to-end time-domain approaches.  ... 
doi:10.21437/interspeech.2020-2205 dblp:conf/interspeech/ChenML20a fatcat:uxnieqk3qbbhrkg5rd3bjlwnza

Monaural Audio Speaker Separation with Source Contrastive Estimation [article]

Cory Stephenson, Patrick Callier, Abhinav Ganesh, Karl Ni
2017 arXiv   pre-print
Our approach involves a deep recurrent neural networks regression to a vector space that is descriptive of independent speakers.  ...  Our approach is similar to recent deep neural network clustering and permutation-invariant training research; we use weighted spectral features and masks to augment individual speaker frequencies while  ...  DC is related to another approach, deep attractor networks (DA) [12] .  ... 
arXiv:1705.04662v1 fatcat:xb5au2ofknambjmp5kxrkbkhne

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method [article]

Cunhang Fan and Jianhua Tao and Bin Liu and Jiangyan Yi and Zhengqi Wen and Xuefei Liu
2020 arXiv   pre-print
In this paper, we propose an end-to-end post-filter method with deep attention fusion features for monaural speaker-independent speech separation.  ...  It is a fully convolutional speech separation network and uses the waveform as the input features.  ...  THE PROPOSED SPEECH SEPARATION METHOD In this paper, we propose an end-to-end post-filter (E2EPF) with deep attention fusion features for monaural speaker-independent speech separation.  ... 
arXiv:2003.07544v1 fatcat:crlw2wh3vzfwhkpkx6gpfpreh4

Supervised Speech Separation Based on Deep Learning: An Overview [article]

DeLiang Wang, Jitong Chen
2018 arXiv   pre-print
Much of the overview is on separation algorithms where we review monaural methods, including speech enhancement (speech-nonspeech separation), speaker separation (multi-talker separation), and speech dereverberation  ...  In particular, the recent introduction of deep learning to supervised speech separation has dramatically accelerated progress and boosted separation performance.  ...  We thank Masood Delfarah for help in manuscript preparation and Jun Du, Yu Tsao, Yuxuan Wang, Yong Xu, and Xueliang Zhang for helpful comments on an earlier version.  ... 
arXiv:1708.07524v2 fatcat:bvaa2yuppffppnta2lfpkk4v4m

Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation [article]

Yuzhou Liu, DeLiang Wang
2019 arXiv   pre-print
We address talker-independent monaural speaker separation from the perspectives of deep learning and computational auditory scene analysis (CASA).  ...  The proposed deep CASA approach optimizes frame-level separation and speaker tracking in turn, and produces excellent results for both objectives.  ...  Monaural Speaker Separation The goal of monaural speaker separation is to estimate C independent speech signals x c (n), c = 1, ..., C, from a single-channel recording of speech mixture y(n), where y(n  ... 
arXiv:1904.11148v1 fatcat:mthvxjmy3fhanhj32f77xbb6xu

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet [article]

David Ditter, Timo Gerkmann
2020 arXiv   pre-print
In this work, we investigate if the learned encoder of the end-to-end convolutional time domain audio separation network (Conv-TasNet) is the key to its recent success, or if the encoder can just as well  ...  In contrast to a common gammatone filterbank, our filters are restricted to 2 ms length to allow for low-latency processing.  ...  In contrast, deep learning approaches such as Deep Clustering [1] , Permutation Invariant Training (PIT) [2, 3] , Deep Attractor Networks [4] and Chimera++ [5] tackle the separation problem by transforming  ... 
arXiv:1910.11615v2 fatcat:7fteorqbxzcdxk2gf2sactb5me

Overlapped speech recognition from a jointly learned multi-channel neural speech extraction and representation [article]

Bo Wu, Meng Yu, Lianwu Chen, Chao Weng, Dan Su, Dong Yu
2019 arXiv   pre-print
First, based on a multi-channel convolutional TasNet with STFT kernel, we unify the multi-channel target speech enhancement front-end network and a convolutional, long short-term memory and fully connected  ...  We propose an end-to-end joint optimization framework of a multi-channel neural speech extraction and deep acoustic model without mel-filterbank (FBANK) extraction for overlapped speech recognition.  ...  Lam for their code assist.  ... 
arXiv:1910.13825v1 fatcat:6bjdns3zfzajficgqoeg3722rq

Recent progresses in deep learning based acoustic models

Dong Yu, Jinyu Li
2017 IEEE/CAA Journal of Automatica Sinica  
We further illustrate robustness issues in speech recognition systems, and discuss acoustic model adaptation, speech enhancement and separation, and robust training strategies.  ...  We first discuss acoustic models that can effectively exploit variable-length contextual information, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and their various combination  ...  Recently, researchers have developed many deep learning techniques for speech enhancement and separation.  ... 
doi:10.1109/jas.2017.7510508 fatcat:zcffvbg75bhllcekqghkmwidsy

Recent Progresses in Deep Learning based Acoustic Models (Updated) [article]

Dong Yu, Jinyu Li
2018 arXiv   pre-print
We further illustrate robustness issues in speech recognition systems, and discuss acoustic model adaptation, speech enhancement and separation, and robust training strategies.  ...  We first discuss acoustic models that can effectively exploit variable-length contextual information, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), and their various combination  ...  Recently, researchers have developed many deep learning techniques for speech enhancement and separation.  ... 
arXiv:1804.09298v2 fatcat:yfxzxu6qanbndcnmt3loikqeym
« Previous Showing results 1 — 15 out of 59 results