8,003 Hits in 8.7 sec

One-Pass Single-Channel Noisy Speech Recognition Using a Combination of Noisy and Enhanced Features

Masakiyo Fujimoto, Hisashi Kawai
2019 Interspeech 2019  
To overcome these problems, we propose a noiserobust acoustic modeling framework based on a feature-level combination of noisy speech and enhanced speech.  ...  This paper introduces a method of noise-robust automatic speech recognition (ASR) that remains effective under one-pass single-channel processing.  ...  The proposed method successfully reduces the influence of speech distortion caused by speech enhancement by using a combination of noisy speech and enhanced speech features as the input for speech recognition  ... 
doi:10.21437/interspeech.2019-1270 dblp:conf/interspeech/FujimotoK19 fatcat:v3aexljkrrdqvae46z4mbpgqlu

Multichannel speech recognition using distributed microphone signal fusion strategies

Marek B. Trawicki, Michael T. Johnson, An Ji, Tomasz S. Osiejuk
2012 2012 International Conference on Audio, Language and Image Processing  
The signals are first fused together based on various heuristics, including their amplitudes, variances, physical distance, or squared distance, before passing the enhanced single-channel signal into the  ...  By combining the noisy distributed microphone signals in an intelligent way that is proportional to the information contained in the signals, speech recognition systems can achieve higher recognition accuracies  ...  IIS-0326395) and DOE (GAANN Grant P200A010104) for supporting this work.  ... 
doi:10.1109/icalip.2012.6376789 fatcat:nkokhnbkwfaphhtx34acsjqb4e

Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR [chapter]

Felix Weninger, Hakan Erdogan, Shinji Watanabe, Emmanuel Vincent, Jonathan Le Roux, John R. Hershey, Björn Schuller
2015 Lecture Notes in Computer Science  
We demonstrate that LSTM speech enhancement, even when used 'naïvely' as front-end processing, delivers competitive results on the CHiME-2 speech recognition task.  ...  We evaluate some recent developments in recurrent neural network (RNN) based speech enhancement in the light of noise-robust automatic speech recognition (ASR).  ...  Third, we provide a systematic comparison of single-channel and two-channel methods, showing that RNN-based singlechannel enhancement can yield a recognition performance that is on par with the previous  ... 
doi:10.1007/978-3-319-22482-4_11 fatcat:svhk3wtwzbejrcpr5qhxxu3xvq

Feature mapping using far-field microphones for distant speech recognition

Ivan Himawan, Petr Motlicek, David Imseng, Sridha Sridharan
2016 Speech Communication  
The individual and combined impacts of beamforming and speaker adaptation techniques along with the feature mapping are examined for distant large vocabulary speech recognition, using a single and multiple  ...  The experimental results on the AMI meeting corpus show that the feature mapping, used in combination with beamforming and speaker adaptation yields a distant speech recognition performance below 50% word  ...  The beamformed audio may then be enhanced using post-filtering [36] before it is passed to a speech recognizer as single-channel speech.  ... 
doi:10.1016/j.specom.2016.07.003 fatcat:4jtd63sdqbfz3iinkj6g3jmmci

An iterative longest matching segment approach to speech enhancement with additive noise and channel distortion

Ji Ming, Danny Crookes
2014 Computer Speech and Language  
Introduction 1 This paper presents a new approach to speech enhancement from single-channel measurements in-2 volving noise, channel distortion (i.e., convolutional noise) and their combination, and demonstrates  ...  The use of our enhancement ap-13 proach as a preprocessor for feature extraction significantly improved the performance of a baseline 14 recognition system for dealing with noisy speech with additive noise  ... 
doi:10.1016/j.csl.2014.04.003 fatcat:km35di7qd5a2ddpd72w5xd6kwa

Enhancement of Spatial Clustering-Based Time-Frequency Masks using LSTM Neural Networks [article]

Felix Grezes, Zhaoheng Ni, Viet Anh Trinh, Michael Mandel
2020 arXiv   pre-print
By using LSTMs to enhance spatial clustering based time-frequency masks, we achieve both the signal modeling performance of multiple single-channel LSTM-DNN speech enhancers and the signal separation performance  ...  Recent works have shown that Deep Recurrent Neural Networks using the LSTM architecture can achieve strong single-channel speech enhancement by estimating time-frequency masks.  ...  We train a distinct LSTM model that uses the single-channel noisy audio to enhance the masks produced my MESSL.  ... 
arXiv:2012.01576v1 fatcat:75oq625fxjg5pg7cipvisfahbi

The MERL/SRI system for the 3RD CHiME challenge using beamforming, robust feature extraction, and advanced speech recognition

Takaaki Hori, Zhuo Chen, Hakan Erdogan, John R. Hershey, Jonathan Le Roux, Vikramjit Mitra, Shinji Watanabe
2015 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)  
Two different types of beamforming are used to combine multimicrophone signals to obtain a single higher quality signal.  ...  Beamformed signal is further processed by a single-channel bi-directional long short-term memory (LSTM) enhancement network which is used to extract stacked mel-frequency cepstral coefficients (MFCC) features  ...  In one system, BLSTM-based single-channel speech enhancement is used to further enhance the weighted delay-and-sum beamformed signalŷ, as described in Section 2.3, and MFCC features (x B ) are extracted  ... 
doi:10.1109/asru.2015.7404833 dblp:conf/asru/HoriCEHRMW15 fatcat:6rj23xsevrbbzlzjbvowd7wflq

Multichannel End-to-end Speech Recognition [article]

Tsubasa Ochiai, Shinji Watanabe, Takaaki Hori, John R. Hershey
2017 arXiv   pre-print
Experiments on the noisy speech benchmarks (CHiME-4 and AMI) show that our multichannel end-to-end system outperformed the attention-based baseline with input from a conventional adaptive beamformer.  ...  The field of speech recognition is in the midst of a paradigm shift: end-to-end neural networks are challenging the dominance of hidden Markov models as a core technology.  ...  As regards end-to-end speech recognition, all existing studies are based on a single channel setup.  ... 
arXiv:1703.04783v1 fatcat:zjwcmk4d35ddtpo7nqutyczdse

The third 'CHiME' speech separation and recognition challenge: Dataset, task and baselines

Jon Barker, Ricard Marxer, Emmanuel Vincent, Shinji Watanabe
2015 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)  
This paper presents the design and outcomes of the 3rd CHiME Challenge, which targets the performance of automatic speech recognition in a real-world, commercially-motivated scenario: a person talking  ...  The CHiME challenge series aims to advance far field speech recognition technology by promoting research at the interface of signal processing and automatic speech recognition.  ...  Enhancement The speech enhancement baseline aims to transform the multichannel noisy input signal into a single-channel enhanced output signal suitable for ASR processing.  ... 
doi:10.1109/asru.2015.7404837 dblp:conf/asru/BarkerMVW15 fatcat:ggyofy6w4ffojhxw6e7bsnynbq

Improved MVDR Beamforming Using Single-Channel Mask Prediction Networks

Hakan Erdogan, John R. Hershey, Shinji Watanabe, Michael I. Mandel, Jonathan Le Roux
2016 Interspeech 2016  
We show that using a single mask across microphones for covariance prediction with minima-limited post-masking yields the best result in terms of signal-level quality measures and speech recognition word  ...  Index Terms: Microphone arrays, neural networks, speech enhancement, MVDR beamforming, LSTM as:ŷ t,f = g f s t,f + n t,f , whereŷ t,f , s t,f and n t,f are the STFT coefficients of the noisy, clean and  ...  yield much better performance than the closest competitor in speech enhancement of single-channel noisy speech [12, 13] .  ... 
doi:10.21437/interspeech.2016-552 dblp:conf/interspeech/ErdoganHWMR16 fatcat:6ac7xcoehbe5xblf6kmqlziq7y

Improving the Intelligibility of Electric and Acoustic Stimulation Speech Using Fully Convolutional Networks Based Speech Enhancement [article]

Natalie Yu-Hsien Wang, Hsiao-Lan Sharon Wang, Tao-Wei Wang, Szu-Wei Fu, Xugan Lu, Yu Tsao, Hsin-Min Wang
2019 arXiv   pre-print
Recently, a time-domain speech enhancement algorithm based on the fully convolutional neural networks (FCN) with a short-time objective intelligibility (STOI)-based objective function (termed FCN(S) in  ...  The combined electric and acoustic stimulation (EAS) has demonstrated better speech recognition than conventional cochlear implant (CI) and yielded satisfactory performance under quiet conditions.  ...  It is clear that directly using the phase of the noisy speech is not optimal and may degrade the enhanced speech quality.  ... 
arXiv:1909.11912v1 fatcat:heuofoaju5cb7o4lphaq2i6ttq

The ICSTM+TUM+UP Approach to the 3rd CHIME Challenge: Single-Channel LSTM Speech Enhancement with Multi-Channel Correlation Shaping Dereverberation and LSTM Language Models [article]

Amr El-Desoky Mousa, Erik Marchi, Björn Schuller
2015 arXiv   pre-print
Our system uses Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Networks (RNNs) for Single-channel Speech Enhancement (SSE).  ...  Networks are trained to predict clean speech as well as noise features from noisy speech features.  ...  German Federal Ministry of Education, Science, Research and Technology (BMBF) under grant agreement No. 16SV7213 (EmotAsS).  ... 
arXiv:1510.00268v1 fatcat:ufslzkzzcbcbrblagde2viv33i

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments [article]

Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Wenyu Jin, Björn Schuller
2018 arXiv   pre-print
We separately discuss single- and multi-channel techniques developed for the front-end and back-end of speech recognition systems, as well as joint front-end and back-end training frameworks.  ...  Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge.  ...  After that, the back-end techniques for single-channel speech can be applied to this enhanced data for speech recognition.  ... 
arXiv:1705.10874v3 fatcat:evdhqnj7eraa5jiolakuf4mf3e

Features and Model Adaptation Techniques for Robust Speech Recognition: A Review

Kapang Legoh, Utpal Bhattacharjee, T. Tuithung
2015 Communications on Applied Electronics  
This paper may be useful as a tutorial and review on state-of-the-art techniques for feature selection, feature normalization and model adaptation techniques for development of robust speech recognition  ...  In this paper, major speech features used in state-of-the-art technology in speech recognition research are reviewed.  ...  It improves the performance of a recognizer in presence of convolution and additive noise. It is also used for enhancement of noisy speech.  ... 
doi:10.5120/cae-1507 fatcat:cbvzysewanet7jmfqmzluhwvpy

Exemplar-based joint channel and noise compensation

Jort F. Gemmeke, Tuomas Virtanen, Kris Demuynck
2013 2013 IEEE International Conference on Acoustics, Speech and Signal Processing  
Building on a compositional model that models noisy speech and a combination of noise and speech atoms, the first model iteratively estimates a filter to best compensate the mismatch with the observed  ...  In this paper two models for channel estimation in exemplar-based noise robust speech recognition are proposed.  ...  In a feature enhancement method [9] , the compositional model is used to obtain clean speech and noise estimates which are in turn used to define a Wiener filter.  ... 
doi:10.1109/icassp.2013.6637772 dblp:conf/icassp/GemmekeVD13 fatcat:v4cndeq44jh4bemrysr3pa56qm
« Previous Showing results 1 — 15 out of 8,003 results