12,783 Hits in 5.4 sec

Time-frequency masking for large scale robust speech recognition

Yuxuan Wang, Ananya Misra, Kean K. Chin
2015 Interspeech 2015   unpublished
In this paper, we demonstrate its utility as a feature enhancement frontend for large vocabulary conversational speech recognition.  ...  Time-frequency mask estimation has shown considerable success recently.  ...  Acknowledgements The authors thank Arun Narayanan for very useful discussions and for training the MTR-AM.  ... 
doi:10.21437/interspeech.2015-533 fatcat:unfuxcjwkje6blopuc2ryazzn4

Perceptual speech processing and phonetic feature mapping for robust vowel recognition

Linkai Bu, T.-D. Church
2000 IEEE Transactions on Speech and Audio Processing  
They remove unperceptible spectral components, and adjust magnitude and frequency scales of speech spectra, respectively.  ...  The proposed perceptual speech processing is based on three perceptual characteristics and consists of three independent processing steps: masking effect, minimum audible field renormalization, and mel-scale  ...  ACKNOWLEDGMENT The authors would like to thank the anonymous reviewers for their valuable comments and suggestions.  ... 
doi:10.1109/89.824695 fatcat:po3lcn7knrd7plxqgudtcsnhha

Time-Frequency Masking: Linking Blind Source Separation and Robust Speech Recognition [chapter]

Marco Khne, Roberto Togneri, Sven Nordholm
2008 Speech Recognition  
Recognition Time-Frequency Masking: Linking Blind Source Separation and Robust Speech Recognition Speech Recognition, Technologies and Applications  ...  Time-Frequency Masking: Linking Blind Source Separation and Robust Speech Recognition, Speech Recognition, France Mihelic and Janez Zibert (Ed.), ISBN: 978-953-7619-29-9, InTech, Available from: http:/  ... 
doi:10.5772/6382 fatcat:rfwux35xdrbftf3dktmnjlosay

A psychoacoustical model of the auditory periphery as the front end for ASR

Jürgen Tchorz, Michael Kleinschmidt, Birger Kollmeier
1999 Journal of the Acoustical Society of America  
The application of a psychoacoustical model of the auditory periphery in the field of automatic speech recognition (ASR) is presented.  ...  Speaker-independent, isolated-digit recognition experiments in different types of noise were carried out to evaluate the robustness of the auditorybased ASR system in adverse conditions.  ...  However, to further evaluate the potential of the auditory model in speech recognition systems, experiments with large word vocabulary as well as connected word recognition experiments are necessary.  ... 
doi:10.1121/1.425500 fatcat:4bqg5qq26rcx3pg74ztpevpvqm

A perceptual masking approach for noise robust speech recognition

Hari Krishna Maganti, Marco Matassoni
2012 EURASIP Journal on Audio, Speech, and Music Processing  
This article describes a modified technique for enhancing noisy speech to improve automatic speech recognition (ASR) performance.  ...  The performed speech recognition evaluations on the noisy standard AURORA-2 tasks show enhanced performance for all noise conditions.  ...  that lead to improved performance for speech recognition systems.  ... 
doi:10.1186/1687-4722-2012-29 fatcat:azwumrymgfcsbcnuhp3cdfhyhi

Robust speech separation using time-frequency masking

P. Aarabi, Guangji Shi, O. Jahromi
2003 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698)  
A multi-microphone time-frequency speech masking technique is proposed.  ...  This technique utilizes both the timefrequency magnitude and phase information in order to estimate the Signal-to-Noise Ratio (SNR) maximizing masking coefficients for each time-frequency block given that  ...  Based on this, the time-frequency block for each microphone is scaled by a masking value between zero and one.  ... 
doi:10.1109/icme.2003.1221024 dblp:conf/icmcs/AarabiSJ03 fatcat:dr3t5hniyvbv5mqgj2iev6ah4a

Noise Robust Feature Scheme for Automatic Speech Recognition Based on Auditory Perceptual Mechanisms

Shang CAI, Yeming XIAO, Jielin PAN, Qingwei ZHAO, Yonghong YAN
2012 IEICE transactions on information and systems  
speech recognition, noise robustness, critical bandwidth, frequency masking, temporal masking  ...  task and a more complex large vocabulary continuous speech recognition (LVCSR) task.  ...  In Sect. 4.3, a more complex large vocabulary continuous speech recognition (LVCSR) task is conducted for evaluation.  ... 
doi:10.1587/transinf.e95.d.1610 fatcat:wct5l7hcezekxbueluia7boame

Spectro-temporal modulation energy based mask for robust speaker identification

Tai-Shih Chi, Ting-Han Lin, Chung-Chien Hsu
2012 Journal of the Acoustical Society of America  
Zhang, "Auditory sparse representation for robust speaker recognition based on tensor structure," EURASIP J. Audio, Speech, Music Process. 2008, 578612 (2008], in low SNR ( 10 dB) conditions.  ...  An algorithm which distinguishes speech from non-speech based on spectro-temporal modulation energies is proposed and evaluated in robust text-independent closed-set speaker identification simulations  ...  Acknowledgments The authors would like to thank the anonymous reviewers for their comments. This work is supported by the National Science Council, Taiwan under Grant No. NSC 100-2220-E-009-004.  ... 
doi:10.1121/1.3697534 pmid:22559454 fatcat:nuu335jmjrhthep76zjtq23tsa

Missing Data Solutions for Robust Speech Recognition [chapter]

Yujun Wang, Jort F. Gemmeke, Kris Demuynck, Hugo Van hamme
2012 Essential Speech and Language Technology for Dutch  
Second, we describe how a state-of-the-art large vocabulary automatic speech recognition (ASR) system based on the prevailing hidden Markov model (HMM) can be made noise robust using conventional MDT.  ...  One of the major concerns when deploying speech recognition applications is the lack of robustness of the technology.  ...  In the spectrogram of noisy speech, MDT distinguishes time-frequency cells that predominantly contain speech or noise energy by introducing a missing data mask.  ... 
doi:10.1007/978-3-642-30910-6_16 dblp:series/tanlp/WangGDh13 fatcat:se3jgvwxejc6rdj74ov6qrpfmm

Noise-Robust ASR for the third 'CHiME' Challenge Exploiting Time-Frequency Masking based Multi-Channel Speech Enhancement and Recurrent Neural Network [article]

Zaihu Pang, Fengyun Zhu
2015 arXiv   pre-print
A time-frequency masking based speech enhancement front-end is proposed to suppress the environmental noise utilizing multi-channel coherence and spatial cues.  ...  based acoustic condition modeling, are carefully integrated into the speech recognition back-end.  ...  ACKNOWLEDGMENT The authors would like to thank Zhiping Zhang, Xiangang Li, Yi Liu and Tong Fu for their kindly helps.  ... 
arXiv:1509.07211v1 fatcat:5ysnjlvclzd3tifjaulwssmutq

CASA-Based Robust Speaker Identification

Xiaojia Zhao, Yang Shao, DeLiang Wang
2012 IEEE Transactions on Audio, Speech, and Language Processing  
As a primary topic in speaker recognition, speaker identification (SID) aims to identify the underlying speaker(s) given a speech utterance.  ...  Existing approaches address this problem from different perspectives such as proposing robust speaker features, introducing noise to clean speaker models, and using speech enhancement methods to restore  ...  segregates speech from interference by producing a time-frequency mask. This dissertation aims to address the SID robustness problem in the CASA framework.  ... 
doi:10.1109/tasl.2012.2186803 fatcat:5wqehvp5u5hgrm4dr3utrjvs3u

Robust speaker identification using a CASA front-end

Xiaojia Zhao, Yang Shao, DeLiang Wang
2011 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Inspired by auditory perception, computational auditory scene analysis (CASA) typically segregates speech by producing a binary time-frequency mask.  ...  Speaker recognition remains a challenging task under noisy conditions.  ...  The output of CASA segregation is a binary time-frequency (T-F) mask that indicates whether a particular T-F unit is dominated by speech or noise.  ... 
doi:10.1109/icassp.2011.5947596 dblp:conf/icassp/ZhaoSW11 fatcat:fxlq5rtqcnb6nkqst4bnmimlg4

A model of dynamic auditory perception and its application to robust word recognition

B. Strope, A. Alwan
1997 IEEE Transactions on Speech and Audio Processing  
Index Terms-Dynamic auditory perception, forward masking, robust speech recognition.  ...  An initial evaluation of the dynamic model together with a local peak isolation mechanism as a front end for dynamic time warp (DTW) and hidden Markov model (HMM) word recognition systems shows an improvement  ...  Morgan and four anonymous reviewers for their helpful suggestions on a previous version of this manuscript.  ... 
doi:10.1109/89.622569 fatcat:kapj3wejjnhuvbzi324qr5vsmy

Binaural sound source separation motivated by auditory processing

Chanwoo Kim, Kshitiz Kumar, Richard M. Stern
2011 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
In this paper we present a new method of signal processing for robust speech recognition using two microphones.  ...  We develop a spatial masking function based on normalized cross-correlation, which provides rejection of off-axis interfering signals.  ...  [20] ), we have observed that power flooring (i.e. the imposition of a minimum power) is very important for robust speech recognition.  ... 
doi:10.1109/icassp.2011.5947497 dblp:conf/icassp/KimKS11 fatcat:vgexmedfmnfitn3bghbuigw34q

Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments [article]

Zixing Zhang, Jürgen Geiger, Jouni Pohjalainen, Amr El-Desoky Mousa, Wenyu Jin, Björn Schuller
2018 arXiv   pre-print
those involved in the development of environmentally robust speech recognition systems.  ...  Eliminating the negative effect of non-stationary environmental noise is a long-standing research topic for automatic speech recognition that stills remains an important challenge.  ...  and designed for Large Vocabulary Continuous Speech Recognition (LVCSR).  ... 
arXiv:1705.10874v3 fatcat:evdhqnj7eraa5jiolakuf4mf3e
« Previous Showing results 1 — 15 out of 12,783 results