974 Hits in 4.5 sec

Unsupervised Training of Neural Mask-Based Beamforming

Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach
2019 Interspeech 2019  
The network is trained to maximize a likelihood criterion derived from a spatial mixture model of the observations.  ...  We present an unsupervised training approach for a neural network-based mask estimator in an acoustic beamforming application.  ...  This naturally included deep unfolding of non-negative matrix factorization (NMF) and also deep unfolding of complex Gaussian mixture models (cGMMs) [15] .  ... 
doi:10.21437/interspeech.2019-2549 dblp:conf/interspeech/DrudeHH19 fatcat:jdcecz3lozgplcjbgb4fljkela

Unsupervised training of neural mask-based beamforming [article]

Lukas Drude, Jahn Heymann, Reinhold Haeb-Umbach
2019 arXiv   pre-print
The network is trained to maximize a likelihood criterion derived from a spatial mixture model of the observations.  ...  We present an unsupervised training approach for a neural network-based mask estimator in an acoustic beamforming application.  ...  This naturally included deep unfolding of non-negative matrix factorization (NMF) and also deep unfolding of complex Gaussian mixture models (cGMMs) [15] .  ... 
arXiv:1904.01578v2 fatcat:scevjbqsirfwdaubnohtvdelle

Incorporating Noise Robustness in Speech Command Recognition by Noise Augmentation of Training Data

Ayesha Pervaiz, Fawad Hussain, Huma Israr, Muhammad Ali Tahir, Fawad Riasat Raja, Naveed Khan Baloch, Farruh Ishmanov, Yousaf Bin Zikria
2020 Sensors  
Noise is one of the major challenges in any speech recognition system, as real-time noise is a very versatile and unavoidable factor that affects the performance of speech recognition systems, particularly  ...  We thoroughly analyse the latest trends in speech recognition and evaluate the speech command dataset on different machine learning based and deep learning based techniques.  ...  Deep neural networks have been shown to give convincing improvements as compared to traditional Gaussian mixture models also on large vocabulary speech recognition tasks.  ... 
doi:10.3390/s20082326 pmid:32325814 pmcid:PMC7219662 fatcat:ftbpxexwd5fvbpj4s2cr76uybq


Simon Leglaive, Laurent Girin, Radu Horaud
2018 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP)  
We explore the use of neural networks as an alternative to a popular speech variance model based on supervised non-negative matrix factorization (NMF).  ...  In this paper we address the problem of enhancing speech signals in noisy mixtures using a source separation approach.  ...  For a given time-frequency point, it mainly consists in modeling the speech variance as a non-linear function of a Gaussian latent random vector, by means of a neural network.  ... 
doi:10.1109/mlsp.2018.8516711 dblp:conf/mlsp/LeglaiveGH18 fatcat:f7k5g4iuxjbdncjrztd2fb7rbe

Factorized Deep Neural Network Adaptation for Automatic Scoring of L2 Speech in English Speaking Tests

Dean Luo, Chunxiao Zhang, Linzhong Xia, Lixin Wang
2018 Interspeech 2018  
In this study, we investigate the effects of deep neural network factorized adaptation techniques on L2 speech assessment in real speaking tests.  ...  Combining the factored components of iVectors and fMLLR transforms can further improve robustness of DNN models in speech recognition and automatic scoring of L2 speech in dynamic environments.  ...  These Maximum Likelihood Linear Regression (MLLR) based acoustic model adaptation techniques are developed for traditional Gaussian mixture model (GMM) ASR systems.  ... 
doi:10.21437/interspeech.2018-2138 dblp:conf/interspeech/LuoZXW18 fatcat:iouefnxa2zflxhf4hny325lq3i

Modeling Feature Representations for Affective Speech using Generative Adversarial Networks [article]

Saurabh Sahu, Rahul Gupta, Carol Espy-Wilson
2019 arXiv   pre-print
Emotion recognition is a classic field of research with a typical setup extracting features and feeding them through a classifier for prediction.  ...  In this work, we experiment with variants of GAN architectures to generate feature vectors corresponding to an emotion in two ways: (i) A generator is trained with samples from a mixture prior.  ...  It is a deep convolutional neural network model trained on millions of images for the purpose of image classification.  ... 
arXiv:1911.00030v1 fatcat:52lc4pe7wjaqbbo3v3zoie4rmq

Speech Enhancement with Variational Autoencoders and Alpha-stable Distributions

Simon Leglaive, Umut Simsekli, Antoine Liutkus, Laurent Girin, Radu Horaud
2019 ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
We develop a Monte Carlo expectation-maximization algorithm for estimating the model parameters at test time.  ...  This paper focuses on single-channel semi-supervised speech enhancement. We learn a speaker-independent deep generative speech model using the framework of variational autoencoders.  ...  Recently, discriminative approaches based on deep neural networks (DNNs) have also been investigated for speech enhancement, with the aim of estimating either clean spectrograms or time-frequency masks  ... 
doi:10.1109/icassp.2019.8682546 dblp:conf/icassp/LeglaiveSLGH19 fatcat:gscf3fnajjg4fl3tsvnnjjqkoy

Fundamentals of speech recognition [chapter]

Jinyu Li, Li Deng, Reinhold Haeb-Umbach, Yifan Gong
2016 Robust Automatic Speech Recognition  
The Deep Neural Network (DNN) is the most important and popular deep learning model, especially for the applications in speech recognition Yu and Deng, 2014) .  ...  s0065 THE BASICS OF DEEP NEURAL NETWORKS p0255 The most successful version of the DNN in speech recognition is the contextdependent deep neural network hidden Markov model (CD-DNN-HMM) , where the HMM  ...  Keywords: Acoustic modeling, Language modeling, Gaussian mixture models, Hidden Markov models, Deep neural networks  ... 
doi:10.1016/b978-0-12-802398-3.00002-7 fatcat:uuz4rcgc7fawpl2rgpwdwshrim

2019 Index IEEE/ACM Transactions on Audio, Speech, and Language Processing Vol. 27

2019 IEEE/ACM Transactions on Audio Speech and Language Processing  
., +, TASLP Dec. 2019 1919-1931 Gaussian processes A Bayesian Hierarchical Model for Speech Enhancement With Time-Varying Audio Channel.  ...  ., +, TASLP Aug. 2019 1267-1279 Feedforward Exploiting Future Word Contexts in Neural Network Language Models for Speech Recognition.  ... 
doi:10.1109/taslp.2020.2971902 fatcat:j66uwjyqlfbmtgda6zhzlswpva

An information fusion approach to recognizing microphone array speech in the CHiME-3 challenge based on a deep learning framework

Jun Du, Qing Wang, Yan-Hui Tu, Xiao Bao, Li-Rong Dai, Chin-Hui Lee
2015 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)  
It is based on a deep learning framework with a large neural network consisting of subnets with different architectures.  ...  We present an information fusion approach to robust recognition of microphone array speech for the recently launched 3rd CHiME Challenge.  ...  of subnets with different architectures, namely deep neural networks (DNNs) [13] and recurrent neural networks (RNNs) [14] , to combine multiple knowledge sources by early feature fusion and late model  ... 
doi:10.1109/asru.2015.7404827 dblp:conf/asru/DuWTBDL15 fatcat:vjiezngqrrdqtnbzssymawsame

A fast maximum likelihood nonlinear feature transformation method for GMM–HMM speaker adaptation

Kaisheng Yao, Dong Yu, Li Deng, Yifan Gong
2014 Neurocomputing  
We describe a novel maximum likelihood nonlinear feature bias compensation method for Gaussian mixture model-hidden Markov model (GMM-HMM) adaptation.  ...  Overall, it can reduce the WER by more than 27% over the speaker-independent system with 0.2 real-time adaptation time. Please cite this article as: K.  ...  Introduction Automatic speech recognition (ASR) systems rely on powerful statistical techniques such as Gaussian mixture model-hidden Markov models (GMM-HMMs) [1] and deep neural network (DNN)-HMMs  ... 
doi:10.1016/j.neucom.2013.02.050 fatcat:szitosp27nfjhdpcusewymkaou

Factor Analysis Based Speaker Verification Using ASR

Hang Su, Steven Wegmann
2016 Interspeech 2016  
We compare statistics collected from several ASR systems, and show that those collected from deep neural networks (DNN) trained with fMLLR features can effectively reduce equal error rate (EER) by more  ...  In this paper, we propose to improve speaker verification performance by importing better posterior statistics from acoustic models trained for Automatic Speech Recognition (ASR).  ...  On the other hand, authors in [8] use bottleneck features extracted from a ASR deep neural network to do speaker and language recognition, and shows that it gives better performance when compared with  ... 
doi:10.21437/interspeech.2016-1157 dblp:conf/interspeech/SuW16 fatcat:evvbzauie5bffbvoo5abg2xtfe

Large-Vocabulary Continuous Speech Recognition Systems: A Look at Some Recent Advances

George Saon, Jen-Tzung Chien
2012 IEEE Signal Processing Magazine  
Alternatively, deep neural networks hold a lot of promise for acoustic modeling although training time on large amounts of data is a limiting factor.  ...  DEEP NEURAL NETWORKS For the past 30 years or so, HMMs with state-dependent GMMs have been the de facto standard in acoustic modeling.  ... 
doi:10.1109/msp.2012.2197156 fatcat:sl3fzg2hz5emrpm6srfuc3n3ye

Target Speech Extraction Based on Blind Source Separation and X-vector-based Speaker Selection Trained with Data Augmentation [article]

Zhaoyi Gu, Lele Liao, Kai Chen, Jing Lu
2020 arXiv   pre-print
ILRMA employs nonnegative matrix factorization (NMF) to capture spectral structures of source signals and MVAE utilizes the strong modeling power of deep neural networks (DNN).  ...  In this paper, we explore a sequential approach for target speech extraction by combining blind source separation (BSS) with the x-vector based speaker recognition (SR) module.  ...  MVAE optimization process After training the CVAE model, the decoder network is utilized as a deep source model in MVAE.  ... 
arXiv:2005.07976v2 fatcat:g7yuqpvmlffgdprqjlzqpje5ly

Cumulative Adaptation for BLSTM Acoustic Models

Markus Kitza, Pavel Golik, Ralf Schlüter, Hermann Ney
2019 Interspeech 2019  
A bidirectional Long Short-Term Memory (BLSTM) based neural network, capable of learning temporal relationships and translation invariant representations, is used for robust acoustic modeling.  ...  This paper addresses the robust speech recognition problem as an adaptation task. Specifically, we investigate the cumulative application of adaptation methods.  ...  Introduction The application of deep neural networks to speech recognition has achieved tremendous success due to its superior performance over the traditional hidden Markov model with Gaussian mixture  ... 
doi:10.21437/interspeech.2019-2162 dblp:conf/interspeech/KitzaGSN19 fatcat:xmtnbche35grxnx7kmdoftwndu
« Previous Showing results 1 — 15 out of 974 results