Filters








577 Hits in 1.7 sec

Combination of two-dimensional cochleogram and spectrogram features for deep learning-based ASR

Andros Tjandra, Sakriani Sakti, Graham Neubig, Tomoki Toda, Mirna Adriani, Satoshi Nakamura
2015 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
Performance was evaluated in the framework of hybrid neural network -hidden Markov model (NN-HMM) system on TIMIT phoneme sequence recognition task.  ...  The best accuracy was obtained by high-level combination of two dimensional cochleogram-spectrogram features using CNN, achieved up to 8.2% relative phoneme error rate (PER) reduction from CNN single features  ...  Then, we construct hybrid DNN-HMM and CNN-HMM systems for phoneme sequence recognition task. Here the HMM tied triphone states are used as the neural network target class.  ... 
doi:10.1109/icassp.2015.7178827 dblp:conf/icassp/TjandraSNTAN15 fatcat:rwpcq7xdwnenlhlf6rtoyaseiy

A Novel Noise Immune, Fuzzy Approach to Speaker Independent, Isolated Word Speech Recognition

Ramin Halavati, Saeed Bagheri Shouraki, Mina Razaghpour, Hossein Tajik, Arpineh Cholakian
2006 2006 World Automation Congress  
The task is based on conversion of speech spectrogram into a linguistic fuzzy description and comparison of this representation with similar linguistic descriptions of words.  ...  The method is tested and compared with a widely used speech recognition approach and has shown a significant higher robustness versus noise.  ...  in compare with word recognition using phonemes, resulting in more possible discrimination.  ... 
doi:10.1109/wac.2006.376025 fatcat:7ucs5lkdbnhxzg3vlk4ckw5zjq

Modelling human speech recognition in challenging noise maskers using machine learning

Birger Kollmeier, Constantin Spille, Angel Mario Castro Martínez, Stephan D. Ewert, Bernd T. Meyer
2020 Acoustical Science and Technology  
signal (as is the case for most auditory model-based SRT predictions).  ...  in modulated noise, it is shown that the DNN is listening in the dips.  ...  The algorithm traces back the activations from the output (phoneme) layer to the input (spectrogram) layer.  ... 
doi:10.1250/ast.41.94 fatcat:uqf7hko6tjdc3ah4w4mcvh723e

A Multistream Feature Framework Based on Bandpass Modulation Filtering for Robust Speech Recognition

Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali
2013 IEEE Transactions on Audio, Speech, and Language Processing  
We adapt such duality in a multistream framework for robust speaker-independent phoneme recognition.  ...  The proposed architecture results in substantial improvements over standard and state-of-the-art feature schemes for phoneme recognition, particularly in presence of nonstationary noise, reverberation  ...  Note that the hybrid HMM/MLP framework overcomes some of the limitations of the standard HMM/GMM systems [32] and achieves better phoneme recognition performance [34] in addition to having advantages  ... 
doi:10.1109/tasl.2012.2219526 pmid:29928166 pmcid:PMC6005699 fatcat:vejxooppfbamtgekhp4ulpfxqy

Development of Hausa Acoustic Model for Speech Recognition

Umar Adam Ibrahim, Moussa Mahamat Boukar, Muhammad Aliyu Suleiman
2022 International Journal of Advanced Computer Science and Applications  
In this regard, this research is concerned with the development of the Hausa acoustic model for automatic speech recognition.  ...  This is done by creating a word-level phonemes dataset from the Hausa speech corpus database. Then implement a deep learning algorithm for acoustic modeling.  ...  To see how successfully the model classified each auditory word in the test set, a confusion matrix was plotted as displayed in Fig. 8 .  ... 
doi:10.14569/ijacsa.2022.0130559 fatcat:v7hbdf6pi5d5lizfsvsytt6mkq

On the relevance of auditory-based Gabor features for deep learning in robust speech recognition

Angel Mario Castro Martinez, Sri Harish Mallidi, Bernd T. Meyer
2017 Computer Speech and Language  
Previous studies support the idea of merging auditory-based Gabor features with deep learning architectures to achieve robust automatic speech recognition, however, the cause behind the gain of such combination  ...  To explain the results, a measure of similarity between phoneme classes from DNN activations is proposed and linked to their acoustic properties.  ...  Acknowledgment This work was funded by the DFG (Cluster of Excellence 1077/1 Hearing4All (http://hearing4all.eu), and the SFB/TRR 31 "The Active Auditory System" (http://www.sfb-trr31.unioldenburg.de/)  ... 
doi:10.1016/j.csl.2017.02.006 fatcat:7fdlydj3bja5hlo4ttdm2sf2ta

Recognition of human speech phonemes using a novel fuzzy approach

Ramin Halavati, Saeed Bagheri Shouraki, Saman Harati Zadeh
2007 Applied Soft Computing  
To do so, the speech spectrogram is converted into a fuzzy linguistic description and this description is used instead of precise acoustic features.  ...  or noise robust recognition.  ...  The benchmark system is an HMM based isolated phoneme recognition system with MFCC features [1, 7, 21, 22] .  ... 
doi:10.1016/j.asoc.2006.02.007 fatcat:7aju4mua7rd6foolb2raz26suy

Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition

Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali
2012 International Journal of Speech Technology  
However most speech processing systems, like automatic speech and speaker recognition systems, suffer from a significant drop in performance when speech signals are corrupted with unseen background distortions  ...  such as speech and speaker recognition.  ...  Parts of this analysis have been presented in (Nemala et al. 2012) .  ... 
doi:10.1007/s10772-012-9184-y pmid:26412979 pmcid:PMC4579853 fatcat:3pdgp2tmsrbyleuw34vylnu3ay

Using Teager Energy Cepstrum And Hmm Distancesin Automatic Speech Recognition And Analysis Of Unvoiced Speech

Panikos Heracleous
2009 Zenodo  
In this study, further analysis of the NAM speech has been made using distance measures between hidden Markov model (HMM) pairs.  ...  In this study, the use of silicon NAM (Non-Audible Murmur) microphone in automatic speech recognition is presented.  ...  Phoneme recognition experiment To evaluate the performance of NAM microphones and investigate the relationship between the HMM distance measures and the phoneme recognition accuracy, a phoneme recognition  ... 
doi:10.5281/zenodo.1055941 fatcat:ofistliqpfek5hwe23waxwwid4

Real-Time Speech Visualization System :Kannon - Applying Auditory Characteristics

Ken Nakamuro, Katsuhiro Haruki, Sueo Sugimoto
2005 Zenodo  
Publication in the conference proceedings of EUSIPCO, Antalya, Turkey, 2005  ...  The TDNN achieved a higher recognition rate in the phoneme recognition than HMM in [5] .  ...  To SPEECH RECOGNITION In previous KanNon system, we have build the speech recognition system using Microsoft speech API which is based on Hidden Markov Model (HMM) in the KanNon system.  ... 
doi:10.5281/zenodo.39288 fatcat:yottye4vdfgqzllakh6fhr3kni

Investigation of DNN-HMM and Lattice Free Maximum Mutual Information Approaches for Impaired Speech Recognition

S. Vishnika Veni, S. Chandrakala
2021 IEEE Access  
The recognition accuracy is evaluated and compared using two datasets namely 20 acoustically similar words and 50 words Impaired Speech Corpus in Tamil.  ...  Impaired speakers have difficulty in pronouncing words which results in partial or incomplete speech contents.  ...  A bidirectional Deep Recurrent Neural Network (biRNN) based DNN-HMM is used for phoneme recognition [15] .  ... 
doi:10.1109/access.2021.3129847 fatcat:t4nx6vf32rdcbpmnduhtlqkp4m

Phoneme recognition using spectral envelope and modulation frequency features

Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
2009 2009 IEEE International Conference on Acoustics, Speech and Signal Processing  
These features are combined at the phoneme posterior level and used as features for a hybrid HMM-ANN phoneme recognizer.  ...  We present a new feature extraction technique for phoneme recognition that uses short-term spectral envelope and modulation frequency features.  ...  In our case, the auditory spectrogram, which is a two-dimensional representation of the input signal, is obtained by stacking the subband temporal envelopes in frequency (similar to the stacking of short-term  ... 
doi:10.1109/icassp.2009.4960618 dblp:conf/icassp/ThomasGH09 fatcat:ze4m2l5b5bhx3c2mzume5oosiu

Toward optimizing stream fusion in multistream recognition of speech

Nima Mesgarani, Samuel Thomas, Hynek Hermansky
2011 Journal of the Acoustical Society of America  
Results on phoneme recognition from noisy speech indicate the effectiveness of the proposed method.  ...  A multistream phoneme recognition framework is proposed based on forming streams from different spectrotemporal modulations of speech.  ...  Figure Captions phoneme recognition system based on the Hidden Markov Model -Artificial Neural Network (HMM-ANN) paradigm (Bourlard and Morgan, 1994) trained on clean speech using TIMIT database.  ... 
doi:10.1121/1.3595744 pmid:21786862 fatcat:axh4yavezjgonbao2qraaot4lu

A speech recognition method based on the sequential multi-layer perceptrons

Wen-Yuan Chen, Sin-Horng Chen, Cheng-Jung Lin
1996 Neural Networks  
A no vel multi-layer perceptrons ( MLP)-based speech recognition method is proposed in this study.  ...  In this method, the dynamic time warping capability of hidden Markov models (HMM) is directly combined with the discriminant based learning of MLP for the sake of employing a sequence of MLPs (SMLP) as  ...  Spectrograms and related outputs of MLPs for these utterances are shown in part (a) and (b)Mandarin digit /~// was mlsrecognized to /r,/: (a) spectrogram; (b) output values of MLPI corresponding to phonemes  ... 
doi:10.1016/0893-6080(95)00140-9 fatcat:gcwq3k7j6rbpjf2ql7wjuq67ga

Speaker-independent isolated digit recognition using an AER silicon cochlea

Mohammad Abdollahi, Shih-Chii Liu
2011 2011 IEEE Biomedical Circuits and Systems Conference (BioCAS)  
In fact, it is shown that despite the limited input dynamic range and the un-modelled nonlinearities produced by the hardware cochlea, the discriminative information present in its spike patterns can potentially  ...  be sufficient for a task as complex as speaker-independent isolated keyword recognition.  ...  95.08% Low-pass filtered spike trains -SVM 95.58% Radon spike counts -SVM 93.79% MFCC+Delta -SVM 96.83% Auditory spectrogram -SVM 78.73% MFCC -HMM 99.70%  ... 
doi:10.1109/biocas.2011.6107779 fatcat:n6rmyt4oyfdk5gbqp2dwimqwv4
« Previous Showing results 1 — 15 out of 577 results