Filters








421 Hits in 3.6 sec

LaFurca: Iterative Refined Speech Separation Based on Context-Aware Dual-Path Parallel Bi-LSTM [article]

Ziqiang Shi and Rujie Liu and Jiqing Han
2020 arXiv   pre-print
In this paper, we propose several improvements of dual-path BiLSTM based network for end-to-end approach to monaural speech separation.  ...  Deep neural network with dual-path bi-directional long short-term memory (BiLSTM) block has been proved to be very effective in sequence modeling, especially in speech separation, e.g. DPRNN-TasNet .  ...  The remainder of this paper is organized as follows: section 2 introduces end-to-end monaural speech separation based on deep neural networks with dual-path BiLSTM blocks.  ... 
arXiv:2001.08998v4 fatcat:2e36uxzugjgcdo47jsg4qe2ptm

Speech Separation Using Convolutional Neural Network and Attention Mechanism

Chun-Miao Yuan, Xue-Mei Sun, Hu Zhao
2020 Discrete Dynamics in Nature and Society  
This paper proposes a speech separation model based on convolutional neural networks and attention mechanism.  ...  Compared to the typical speech separation model DRNN-2 + discrim, this method achieves 0.27 dB GNSDR gain and 0.51 dB GSIR gain, which illustrates that the speech separation model proposed in this paper  ...  methods such as model-based methods and speech enhancement methods. (2) Newer methods using DNNs (Deep Neural Networks).  ... 
doi:10.1155/2020/2196893 doaj:c5b71d75a44b42daaa0de1388bf01d6b fatcat:7nm7niv3trfcjpxfm5x4huwvoy

Multi-Microphone Complex Spectral Mapping for Speech Dereverberation [article]

Zhong-Qiu Wang, DeLiang Wang
2020 arXiv   pre-print
This study proposes a multi-microphone complex spectral mapping approach for speech dereverberation on a fixed array geometry.  ...  Experimental results on multi-channel speech dereverberation demonstrate the effectiveness of the proposed approach.  ...  on monaural dereverberation. !  ... 
arXiv:2003.01861v1 fatcat:7yaudakexrddzcyq5gpbm6flka

A Two-Stage Phase-Aware Approach for Monaural Multi-Talker Speech Separation

Lu YIN, Junfeng LI, Yonghong YAN, Masato AKAGI
2020 IEICE transactions on information and systems  
Recently, deep neural networks have dramatically improved the speech separation performance.  ...  The study implements the MISI algorithm based on the mask and gives that the ideal amplitude mask (IAM) is the optimal mask for the mask-based MISI phase recovery, which brings less phase distortion.  ...  In recent years, neural network-based speech separation has attracted increasing attention.  ... 
doi:10.1587/transinf.2019edp7259 fatcat:tdksupmtszh2hfm7qs5zhrce2a

Integrating Spectral and Spatial Features for Multi-Channel Speaker Separation

Zhong-Qiu Wang, DeLiang Wang
2018 Interspeech 2018  
Strong separation performance has been observed on a spatialized reverberant version of the wsj0-2mix corpus.  ...  This paper tightly integrates spectral and spatial information for deep learning based multi-channel speaker separation.  ...  We introduce two types of directional features, one based on compensating IPDs and the other based on beamforming.  ... 
doi:10.21437/interspeech.2018-1940 dblp:conf/interspeech/WangW18 fatcat:lataq7hgebdhzbwhitx7oabdzm

Monaural Multi-Talker Speech Recognition using Factorial Speech Processing Models [article]

Mahdi Khademian, Mohammad Mehdi Homayounpour
2016 arXiv   pre-print
The proposed method of the IBM team, consist of an intermediate speech separation and then a single-talker speech recognition.  ...  This paper reconsiders the task of this challenge based on gain adapted factorial speech processing models.  ...  The method presented in this paper is a model based approach based on factorial speech processing models for recognizing monaural mixed-speech signals which is applied for the "Monaural speech separation  ... 
arXiv:1610.01367v1 fatcat:rlcka7fkrzafbjk2bfhppewm6i

Deep Bayesian Unsupervised Source Separation Based on a Complex Gaussian Mixture Model [article]

Yoshiaki Bando, Yoko Sasaki, Kazuyoshi Yoshii
2019 arXiv   pre-print
In addition, the pre-trained network can be used not only for conducting monaural separation but also for efficiently initializing a multichannel separation algorithm.  ...  The proposed method uses a cost function based on a spatial model called a complex Gaussian mixture model (cGMM).  ...  [14] trained a monaural separation network by using source signals estimated by applying K-means clustering on interchannel phase differences (IPDs) between two microphones.  ... 
arXiv:1908.11307v1 fatcat:34gjxbsexbhynie7whhoqzpmqu

Knowledge Distillation for End-to-End Monaural Multi-Talker ASR System

Wangyou Zhang, Xuankai Chang, Yanmin Qian
2019 Interspeech 2019  
The proposed methods are evaluated on two-speaker mixed speech generated from the WSJ0 corpus, which is commonly used for this task recently.  ...  End-to-end models for monaural multi-speaker automatic speech recognition (ASR) have become an important and interesting approach when dealing with the multi-talker mixed speech under cocktail party scenario  ...  Experiments have been carried out on the PI supercomputer at Shanghai Jiao Tong University.  ... 
doi:10.21437/interspeech.2019-3192 dblp:conf/interspeech/ZhangCQ19 fatcat:yfdpkyqngff4hfvukclj5blf44

Feature Joint-State Posterior Estimation in Factorial Speech Processing Models using Deep Neural Networks [article]

Mahdi Khademian, Mohammad Mehdi Homayounpour
2017 arXiv   pre-print
The experiments compare the proposed network decoding results to those of the vector Taylor series method and show 2.3% absolute performance improvement in the monaural speech separation and recognition  ...  This paper proposes a new method for calculating joint-state posteriors of mixed-audio features using deep neural networks to be used in factorial speech processing models.  ...  Based on this assumption, we propose the following three steps for training a deep neural network for extracting joint-state posteriors: the generative phase, initializing joint-state layer weights, and  ... 
arXiv:1707.02661v1 fatcat:fbdytm5mkbepfn2gxbu56jxlle

Temporal-Spatial Neural Filter: Direction Informed End-to-End Multi-channel Target Speech Separation [article]

Rongzhi Gu, Yuexian Zou
2020 arXiv   pre-print
Despite the recent advances in deep learning based close-talk speech separation, the applications to real-world are still an open issue.  ...  Target speech separation refers to extracting the target speaker's speech from mixed signals.  ...  learning-based MSS methods, including Freq-BLSTM based speech separation methods, multi-channel deep clustering (DC) [40] and neural spatial filter [75] .  ... 
arXiv:2001.00391v1 fatcat:bb33mmziofhfzd673ytisr4dwy

Iterative Deep Neural Networks for Speaker-Independent Binaural Blind Speech Separation

Qingju Liu, Yong Xu, Philip JB Jackson, Wenwu Wang, Philip Coleman
2018 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper, we propose an iterative deep neural network (DNN)-based binaural source separation scheme, for recovering two concurrent speech signals in a room environment.  ...  Index Terms-Deep neural network, binaural blind speech separation, spectral and spatial, iterative DNN  ...  INTRODUCTION Deep neural networks (DNN) [1] have recently been exploited in the field of blind source separation [2] , e.g., to extract target speech corrupted by background noise [3] [4] [5] [6] [  ... 
doi:10.1109/icassp.2018.8462603 dblp:conf/icassp/Liu0JWC18 fatcat:gjlmw2uwajbq3g4o2wre5fsedi

Binaural speaker identification using the equalization-cancelation technique

Masoud Geravanchizadeh, Sina Ghalamiosgouei
2020 EURASIP Journal on Audio, Speech, and Music Processing  
Simulation results show the superiority of the proposed method in all experimental conditions.  ...  The equalization-cancelation algorithm is employed to enhance the input test speech and alleviate the detrimental effects of noise and reverberation in the speaker identification system.  ...  As one of the binaural speech segregation methods, the mask is estimated by employing a deep neural network (DNN) classification method [47, 48] .  ... 
doi:10.1186/s13636-020-00188-y fatcat:uo65ddsjdbebzm5luwcr6ypt3q

Table of Contents

2021 IEEE/ACM Transactions on Audio Speech and Language Processing  
Mesgarani Time-Domain Audio Source Separation With Neural Networks Based on Multiresolution Analysis . . . . . . . . . . . . . . . ....Saruwatari Conditioned Source Separation for Musical Instrument Performances  ...  . . . . . . . . . . . . . . . . ....Wang Monaural Speech Separation Using Speaker Embedding From Preliminary Separation . . . . . . ....J.  ...  Speech Enhancement and Separation  ... 
doi:10.1109/taslp.2021.3137066 fatcat:ocit27xwlbagtjdyc652yws4xa

Multi-microphone Complex Spectral Mapping for Utterance-wise and Continuous Speech Separation [article]

Zhong-Qiu Wang and Peidong Wang and DeLiang Wang
2021 arXiv   pre-print
Assuming a fixed array geometry between training and testing, we train deep neural networks (DNN) to predict the real and imaginary (RI) components of target speech at a reference microphone from the RI  ...  Although our system is trained on simulated room impulse responses (RIR) based on a fixed number of microphones arranged in a given geometry, it generalizes well to a real array with the same geometry.  ...  using a convolutional encoder-decoder neural network (see Figure 4 ).  ... 
arXiv:2010.01703v2 fatcat:huvvxizr2jhjlhugtwk4kr7kze

Sequential Multi-Frame Neural Beamforming for Speech Separation and Enhancement [article]

Zhong-Qiu Wang, Hakan Erdogan, Scott Wisdom, Kevin Wilson, Desh Raj, Shinji Watanabe, Zhuo Chen, John R. Hershey
2020 arXiv   pre-print
This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.  ...  Our neural networks for separation use an advanced convolutional architecture trained with a novel stabilized signal-to-noise ratio loss function.  ...  Abstract This work introduces sequential neural beamforming, which alternates between neural network based spectral separation and beamforming based spatial separation.  ... 
arXiv:1911.07953v3 fatcat:ruylaknm6jftzamouvkfx4akza
« Previous Showing results 1 — 15 out of 421 results