1,507 Hits in 6.4 sec

Speaker normalized spectral subband parameters for noise robust speech recognition

Satoru Tsuge, Toshiaki Fukada, Harald Singer, Kuldip K. Paliwal
1999 Journal of the Acoustical Society of Japan (E)  
SSCs are computed as frequency centroids for each subband from the power spectrum of the speech signal.  ...  Experimental results on spontaneous speech recognition show that the speaker normalized SSCs are more useful as supplementary features for improving the recognition performance than the conventional SSCs  ...  In order to use SSCs for speakerindependent tasks, we incorporate a speaker normalization technique into SSC computation to reduce the speaker variability.  ... 
doi:10.1250/ast.20.425 fatcat:l34apvgutbhnhhsxqwscdfmdiu

VTLN Through Frequency Warping Based on Pitch

C. Lopes, F. Perdigão
2003 Journal of Communication and Information Systems  
This procedure aims to reduce the inter-speaker variability of speech signals in order to obtain a robust automatic speech recognition system.  ...  Inter-speaker variability removal is performed by a traditional speaker normalization method, which consists in expanding or compressing the Mel filterbank bandwidths, in order to normalize the Vocal Tract  ...  This procedure aims to reduce the inter-speaker variability of speech signals in order to obtain a robust automatic speech recognition system.  ... 
doi:10.14209/jcis.2003.10 fatcat:4beabeo26ngr5pd4r7f6aan3qq

Pitch Mean Based Frequency Warping [chapter]

Jian Liu, Thomas Fang Zheng, Wenhu Wu
2006 Lecture Notes in Computer Science  
In this paper, a novel pitch mean based frequency warping (PMFW) method is proposed to reduce the pitch variability in speech signals at the frontend of speech recognition.  ...  The warp factors used in this process are calculated based on the average pitch of a speech segment.  ...  In [8, 9] , the formant-based frequency warping was discussed for speaker normalization. However, the motivation of this paper is not only implementing speaker normalization.  ... 
doi:10.1007/11939993_13 fatcat:f7xquuyfm5e3jmfkea3qqgbcmu

Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization

Umit H. Yapanel, John H. L. Hansen
2008 EURASIP Journal on Audio, Speech, and Music Processing  
A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization.  ...  Evaluations are performed for (i) an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER) by 24%, and (ii) for a diverse noisy speech  ...  There are different ways to address speaker variability for automatic speech recognition.  ... 
doi:10.1155/2008/148967 fatcat:wp35yfj77bfudkns3j5cy7gxtu

A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition

Umit H. Yapanel, John H.L. Hansen
2008 Speech Communication  
Acoustic feature extraction from speech constitutes a fundamental component of automatic speech recognition (ASR) systems.  ...  The effectiveness of the PMVDR approach is demonstrated by comparing speech recognition accuracies with the traditional MFCC front-end and recently proposed PMCC front-end in both noise-free and real adverse  ...  Watson Research Center for his helpful discussions during stages of this research. We thank Bryan Pellom, formerly of University of Colorado for helpful discussions on the SONIC recognizer.  ... 
doi:10.1016/j.specom.2007.07.006 fatcat:rwvrphja5jal3jwck44ffnzj5y

A frequency warping approach to speaker normalization

L. Lee, R. Rose
1998 IEEE Transactions on Speech and Audio Processing  
In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated.  ...  Index Terms-Continuous speech recognition, frequency warping, hidden Markov modeling, speaker normalization.  ...  The frequency warping approach to speaker normalization was compared to other simple methods for reducing the effects of speaker and channel variability on speech recognition performance.  ... 
doi:10.1109/89.650310 fatcat:i2nqunjuorcmnanr5z4dz6tmoq

Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database

Musab T. S. Al-Kaltakchi, Haithem Abd Al-Raheem Taha, Mohanad Abd Shehab, Mohamed A.M. Abdullah
2020 Indonesian Journal of Electrical Engineering and Computer Science  
<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition.  ...  With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction.  ...  In order to build a robust speaker recognition system, the effect of feature extraction method should be investigated.  ... 
doi:10.11591/ijeecs.v18.i2.pp782-789 fatcat:kxax3nzj55h7feegoiucelw4yi

Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech [article]

Vishwanath Pratap Singh, Hardik Sailor, Supratik Bhattacharya, Abhishek Pandey
2022 arXiv   pre-print
Training a robust Automatic Speech Recognition (ASR) system for children's speech recognition is a challenging task due to inherent differences in acoustic attributes of adult and child speech and scarcity  ...  In this paper, a novel segmental spectrum warping and perturbations in formant energy are introduced, to generate a children-like speech spectrum from that of an adult's speech spectrum.  ...  However, the performance of such systems for children's speech suffers from the large inter-speaker variability due to differing rates of growth, and intra-speaker variability due to undeveloped pronunciation  ... 
arXiv:2203.06600v1 fatcat:6lursxhtdrg5vfyx3i4g2ks2fm

Robust recognition of children's speech

A. Potamianos, S. Narayanan
2003 IEEE Transactions on Speech and Audio Processing  
Such variabilities pose challenges for robust automatic recognition of children's speech.  ...  A speaker normalization algorithm that combines frequency warping and model transformation is shown to reduce acoustic variability and significantly improve ASR performance for children speakers (by 25  ...  Lee at the University of Southern California, for discussions and help related to this work. Most of this work was done when the authors were with AT&T Labs-Research.  ... 
doi:10.1109/tsa.2003.818026 fatcat:fntbxcw2qzgstf5jzmsvcop43q

Speech-Signal-Based Frequency Warping

Kuldip Paliwal, Benjamin Shannon, James Lyons, Kamil Wojcicki
2009 IEEE Signal Processing Letters  
The speech-signal-based frequency warping is obtained by considering equal area portions of the log spectrum.  ...  The warping is then used in filterbank design for automatic speech recognition experiments.  ...  for correct speech recognition.  ... 
doi:10.1109/lsp.2009.2014096 fatcat:fphxdeafqrejph4yv7emihxdf4

A Comparative Study of Feature and Score Normalization for Speaker Verification [chapter]

Rong Zheng, Shuwu Zhang, Bo Xu
2005 Lecture Notes in Computer Science  
In this paper, two stages of normalization techniques, feature normalization and score normalization, are examined for decreasing the mismatch between training and testing acoustic conditions.  ...  the output scores entirely and make the speaker-independent decision threshold more robust under adverse conditions.  ...  Alternatively, robust speech recognition techniques have been introduced to reduce the effect of linear channel and slowly variable additive noise.  ... 
doi:10.1007/11608288_71 fatcat:rmp4p4arbbg3nf5pwtgqbfqyaa

A comparative study of traditional and newly proposed features for recognition of speech under stress

S.E. Bou-Ghazale, J.H.L. Hansen
2000 IEEE Transactions on Speech and Audio Processing  
It is well known that the performance of speech recognition algorithms degrade in the presence of adverse environments where a speaker is under stress, emotion, or Lombard effect.  ...  Finally, the effect of various parameter processing such as fixed versus variable preemphasis, liftering, and fixed versus cepstral mean normalization are studied.  ...  Fig. 1 shows a general speech recognition scenario which considers a variety of speech/speaker distortions, and the three general approaches to robust speech recognition.  ... 
doi:10.1109/89.848224 fatcat:4t23jt55kraqnci7u3y32mja4i

Speaker Recognition in Mismatch Conditions: A Feature Level Approach

Sharada V Chougule, Mahesh S. Chavan
2017 International Journal of Image Graphics and Signal Processing  
Mismatch in speech data is one of the major reasons limiting the use of speaker recognition technology in real world applications.  ...  Centroids (SSCs) are used for evaluating the robustness in mismatch conditions.  ...  ACKNOWLEDGMENT The authors would like to thank IIT Guwahati for providing speech database.  ... 
doi:10.5815/ijigsp.2017.04.05 fatcat:dfrgsv2y2rhrpaqac2oatheswy

Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition

Masoud Geravanchizadeh, Elnaz Forouhandeh, Meysam Bashirpour
2021 EURASIP Journal on Audio, Speech, and Music Processing  
In this paper, the vocal tract length normalization method is employed to enhance the robustness of the emotion-affected speech recognition system.  ...  For this purpose, two structures of the speech recognition system based on hybrids of hidden Markov model with Gaussian mixture model and deep neural network are used.  ...  The frequency warping in DCT was employed in speech recognition tasks for speaker normalization [13] . The same approach was utilized by Sheikhan et al.  ... 
doi:10.1186/s13636-021-00216-5 fatcat:u6prl46qlvelvdmdzq4m7m3zfa

Adverse Conditions and ASR Techniques for Robust Speech User Interface [article]

Urmila Shrawankar, VM Thakare
2013 arXiv   pre-print
The goal of this research is to increase the robustness of the speech recognition systems with respect to changes in the environment.  ...  The main motivation for Automatic Speech Recognition (ASR) is efficient interfaces to computers, and for the interfaces to be natural and truly useful, it should provide coverage for a large group of users  ...  Models for Auxiliary Parameters [20] Most of speech recognition systems rely on acoustic parameters that represent the speech spectrum, for example cepstral coefficients.  ... 
arXiv:1303.5515v1 fatcat:hxbw6k5konaixorjmaboov2hxa
« Previous Showing results 1 — 15 out of 1,507 results