A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2018; you can also visit the original URL.
The file type is application/pdf
.
Filters
Speaker normalized spectral subband parameters for noise robust speech recognition
1999
Journal of the Acoustical Society of Japan (E)
SSCs are computed as frequency centroids for each subband from the power spectrum of the speech signal. ...
Experimental results on spontaneous speech recognition show that the speaker normalized SSCs are more useful as supplementary features for improving the recognition performance than the conventional SSCs ...
In order to use SSCs for speakerindependent tasks, we incorporate a speaker normalization technique into SSC computation to reduce the speaker variability. ...
doi:10.1250/ast.20.425
fatcat:l34apvgutbhnhhsxqwscdfmdiu
VTLN Through Frequency Warping Based on Pitch
2003
Journal of Communication and Information Systems
This procedure aims to reduce the inter-speaker variability of speech signals in order to obtain a robust automatic speech recognition system. ...
Inter-speaker variability removal is performed by a traditional speaker normalization method, which consists in expanding or compressing the Mel filterbank bandwidths, in order to normalize the Vocal Tract ...
This procedure aims to reduce the inter-speaker variability of speech signals in order to obtain a robust automatic speech recognition system. ...
doi:10.14209/jcis.2003.10
fatcat:4beabeo26ngr5pd4r7f6aan3qq
Pitch Mean Based Frequency Warping
[chapter]
2006
Lecture Notes in Computer Science
In this paper, a novel pitch mean based frequency warping (PMFW) method is proposed to reduce the pitch variability in speech signals at the frontend of speech recognition. ...
The warp factors used in this process are calculated based on the average pitch of a speech segment. ...
In [8, 9] , the formant-based frequency warping was discussed for speaker normalization. However, the motivation of this paper is not only implementing speaker normalization. ...
doi:10.1007/11939993_13
fatcat:f7xquuyfm5e3jmfkea3qqgbcmu
Towards an Intelligent Acoustic Front End for Automatic Speech Recognition: Built-in Speaker Normalization
2008
EURASIP Journal on Audio, Speech, and Music Processing
A proven method for achieving effective automatic speech recognition (ASR) due to speaker differences is to perform acoustic feature speaker normalization. ...
Evaluations are performed for (i) an in-car extended digit recognition task, where an on-the-fly BISN implementation reduces the relative word error rate (WER) by 24%, and (ii) for a diverse noisy speech ...
There are different ways to address speaker variability for automatic speech recognition. ...
doi:10.1155/2008/148967
fatcat:wp35yfj77bfudkns3j5cy7gxtu
A new perceptually motivated MVDR-based acoustic front-end (PMVDR) for robust automatic speech recognition
2008
Speech Communication
Acoustic feature extraction from speech constitutes a fundamental component of automatic speech recognition (ASR) systems. ...
The effectiveness of the PMVDR approach is demonstrated by comparing speech recognition accuracies with the traditional MFCC front-end and recently proposed PMCC front-end in both noise-free and real adverse ...
Watson Research Center for his helpful discussions during stages of this research. We thank Bryan Pellom, formerly of University of Colorado for helpful discussions on the SONIC recognizer. ...
doi:10.1016/j.specom.2007.07.006
fatcat:rwvrphja5jal3jwck44ffnzj5y
A frequency warping approach to speaker normalization
1998
IEEE Transactions on Speech and Audio Processing
In an effort to reduce the degradation in speech recognition performance caused by variation in vocal tract shape among speakers, a frequency warping approach to speaker normalization is investigated. ...
Index Terms-Continuous speech recognition, frequency warping, hidden Markov modeling, speaker normalization. ...
The frequency warping approach to speaker normalization was compared to other simple methods for reducing the effects of speaker and channel variability on speech recognition performance. ...
doi:10.1109/89.650310
fatcat:i2nqunjuorcmnanr5z4dz6tmoq
Comparison of feature extraction and normalization methods for speaker recognition using grid-audiovisual database
2020
Indonesian Journal of Electrical Engineering and Computer Science
<p><span lang="EN-GB">In this paper, different feature extraction and feature normalization methods are investigated for speaker recognition. ...
With a view to give a good representation of acoustic speech signals, Power Normalized Cepstral Coefficients (PNCCs) and Mel Frequency Cepstral Coefficients (MFCCs) are employed for feature extraction. ...
In order to build a robust speaker recognition system, the effect of feature extraction method should be investigated. ...
doi:10.11591/ijeecs.v18.i2.pp782-789
fatcat:kxax3nzj55h7feegoiucelw4yi
Spectral Modification Based Data Augmentation For Improving End-to-End ASR For Children's Speech
[article]
2022
arXiv
pre-print
Training a robust Automatic Speech Recognition (ASR) system for children's speech recognition is a challenging task due to inherent differences in acoustic attributes of adult and child speech and scarcity ...
In this paper, a novel segmental spectrum warping and perturbations in formant energy are introduced, to generate a children-like speech spectrum from that of an adult's speech spectrum. ...
However, the performance of such systems for children's speech suffers from the large inter-speaker variability due to differing rates of growth, and intra-speaker variability due to undeveloped pronunciation ...
arXiv:2203.06600v1
fatcat:6lursxhtdrg5vfyx3i4g2ks2fm
Robust recognition of children's speech
2003
IEEE Transactions on Speech and Audio Processing
Such variabilities pose challenges for robust automatic recognition of children's speech. ...
A speaker normalization algorithm that combines frequency warping and model transformation is shown to reduce acoustic variability and significantly improve ASR performance for children speakers (by 25 ...
Lee at the University of Southern California, for discussions and help related to this work. Most of this work was done when the authors were with AT&T Labs-Research. ...
doi:10.1109/tsa.2003.818026
fatcat:fntbxcw2qzgstf5jzmsvcop43q
Speech-Signal-Based Frequency Warping
2009
IEEE Signal Processing Letters
The speech-signal-based frequency warping is obtained by considering equal area portions of the log spectrum. ...
The warping is then used in filterbank design for automatic speech recognition experiments. ...
for correct speech recognition. ...
doi:10.1109/lsp.2009.2014096
fatcat:fphxdeafqrejph4yv7emihxdf4
A Comparative Study of Feature and Score Normalization for Speaker Verification
[chapter]
2005
Lecture Notes in Computer Science
In this paper, two stages of normalization techniques, feature normalization and score normalization, are examined for decreasing the mismatch between training and testing acoustic conditions. ...
the output scores entirely and make the speaker-independent decision threshold more robust under adverse conditions. ...
Alternatively, robust speech recognition techniques have been introduced to reduce the effect of linear channel and slowly variable additive noise. ...
doi:10.1007/11608288_71
fatcat:rmp4p4arbbg3nf5pwtgqbfqyaa
A comparative study of traditional and newly proposed features for recognition of speech under stress
2000
IEEE Transactions on Speech and Audio Processing
It is well known that the performance of speech recognition algorithms degrade in the presence of adverse environments where a speaker is under stress, emotion, or Lombard effect. ...
Finally, the effect of various parameter processing such as fixed versus variable preemphasis, liftering, and fixed versus cepstral mean normalization are studied. ...
Fig. 1 shows a general speech recognition scenario which considers a variety of speech/speaker distortions, and the three general approaches to robust speech recognition. ...
doi:10.1109/89.848224
fatcat:4t23jt55kraqnci7u3y32mja4i
Speaker Recognition in Mismatch Conditions: A Feature Level Approach
2017
International Journal of Image Graphics and Signal Processing
Mismatch in speech data is one of the major reasons limiting the use of speaker recognition technology in real world applications. ...
Centroids (SSCs) are used for evaluating the robustness in mismatch conditions. ...
ACKNOWLEDGMENT The authors would like to thank IIT Guwahati for providing speech database. ...
doi:10.5815/ijigsp.2017.04.05
fatcat:dfrgsv2y2rhrpaqac2oatheswy
Feature compensation based on the normalization of vocal tract length for the improvement of emotion-affected speech recognition
2021
EURASIP Journal on Audio, Speech, and Music Processing
In this paper, the vocal tract length normalization method is employed to enhance the robustness of the emotion-affected speech recognition system. ...
For this purpose, two structures of the speech recognition system based on hybrids of hidden Markov model with Gaussian mixture model and deep neural network are used. ...
The frequency warping in DCT was employed in speech recognition tasks for speaker normalization [13] . The same approach was utilized by Sheikhan et al. ...
doi:10.1186/s13636-021-00216-5
fatcat:u6prl46qlvelvdmdzq4m7m3zfa
Adverse Conditions and ASR Techniques for Robust Speech User Interface
[article]
2013
arXiv
pre-print
The goal of this research is to increase the robustness of the speech recognition systems with respect to changes in the environment. ...
The main motivation for Automatic Speech Recognition (ASR) is efficient interfaces to computers, and for the interfaces to be natural and truly useful, it should provide coverage for a large group of users ...
Models for Auxiliary Parameters [20] Most of speech recognition systems rely on acoustic parameters that represent the speech spectrum, for example cepstral coefficients. ...
arXiv:1303.5515v1
fatcat:hxbw6k5konaixorjmaboov2hxa
« Previous
Showing results 1 — 15 out of 1,507 results