Filters








316 Hits in 3.9 sec

Automatic Recognition of Pathological Phoneme Production

Robert Wielgat, Tomasz P. Zieliński, Tomasz Woźniak, Stanisław Grabias, Daniel Król
2008 Folia Phoniatrica et Logopaedica  
Results obtained by DTW methods, mainly by modified phoneme-based DTW classifier, were slightly better in comparison with the HMM classifier.  ...  The HMM classifier was based on wholeword models as well as phoneme models. Results present a comparative analysis of DTW and HMM methods. Conclu-  ...  Classification via Phoneme-Based DTW The standard DTW method described in the previous section is based on whole-word models.  ... 
doi:10.1159/000170083 pmid:19011305 fatcat:bg4czeujjrdnnouvsrwgjicgwa

Latent perceptual mapping with data-driven variable-length acoustic units for template-based speech recognition

Shiva Sundaram, Jerome R. Bellegarda
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
This extension is experimentally validated on a context-independent phoneme classification task using the TIMIT corpus.  ...  In recent work, we introduced Latent Perceptual Mapping (LPM) [1], a new framework for acoustic modeling suitable for templatelike speech recognition.  ...  Direct DTW-based classification using complete phoneme segments on quantized sequences results in an average performance of Proposed LPM classification performance.  ... 
doi:10.1109/icassp.2012.6288826 dblp:conf/icassp/SundaramB12 fatcat:ragphlsztzbnhfj76ulyiayqzu

Speech Recognition System and Isolated Word Recognition based on Hidden Markov Model (HMM) for Hearing Impaired

S. Ananthi, P. Dhanalakshmi
2013 International Journal of Computer Applications  
Based upon the probability of the state it generates possible word sequence for the spoken word. Instead of listening to the speech, the generated sequence of text can be easily viewed.  ...  Usual Method used in Speech Recognition (SR) is Neural Network, Hidden Markov Model (HMM) and Dynamic Time Warping (DTW). The widely used technique for Speech Recognition is HMM.  ...  DTW are still heavy computational load and treatment of durational variations as noise to be eliminated via time normalization.  ... 
doi:10.5120/13012-0241 fatcat:2wyxwat2qre33cxlc4ygadtr2i

Experimental studies on effect of speaking mode on spoken term detection

Kallola Rout, Pappagari Raghavendra Reddy, K Sri Rama Murty
2015 2015 Twenty First National Conference on Communications (NCC)  
Matching is done using Subsequence dynamic time warping (DTW) on posterior features of query and reference utterances, obtained by training Multilayer perceptron (MLP).  ...  Durations of phonemes in query words greatly vary between these two modes. Hence pattern matching stage plays a crucial role which takes care of temporal variations.  ...  A comparison study is carried out by taking different number of phoneme classes for classification. For phoneme recognition we used both HMM and MLP and the results are shown in TABLE II.  ... 
doi:10.1109/ncc.2015.7084926 dblp:conf/ncc/RoutRM15 fatcat:slgcvsjggjdr7ewkq6y5rbgb5q

Speech Processing for Language Learning: A Practical Approach to Computer-Assisted Pronunciation Teaching

Natalia Bogach, Elena Boitsova, Sergey Chernonog, Anton Lamtev, Maria Lesnichaya, Iurii Lezhenin, Andrey Novopashenny, Roman Svechnikov, Daria Tsikach, Konstantin Vasiliev, Evgeny Pyshkin, John Blake
2021 Electronics  
Both are designed on a base of a third-party automatic speech recognition (ASR) library Kaldi, which was incorporated inside StudyIntonation signal processing software core.  ...  The learners feedback in the sense of pronunciation assessment was also updated and a conventional mechanism based on dynamic time warping (DTW) was combined with cross-recurrence quantification analysis  ...  To assess the recognition quality, the distance metric was calculated as Levenstein distance L, which shows how much two phoneme sequences differ [60] .  ... 
doi:10.3390/electronics10030235 fatcat:io6bpmxglbeo5jp4bbz6depy5e

A Review on Speech Recognition Technique

Santosh K. Gaikwad, Bharti W. Gawali, Pravin Yannawar
2010 International Journal of Computer Applications  
and also gives overview technique developed in each stage of speech recognition.  ...  Speech has potential of being important mode of interaction with computer .This paper gives an overview of major technological perspective and appreciation of the fundamental progress of speech recognition  ...  Dynamic time warping (DTW) is such a typical approach for a template based approach matching for speech recognition and also DTW stretches and compresses various sections of utterance so as to find alignment  ... 
doi:10.5120/1462-1976 fatcat:zx4z3uczqjh2hhnrtc46b3rycu

Dual stream speech recognition using articulatory syllable models

Antti Puurula, Dirk Van Compernolle
2010 International Journal of Speech Technology  
A baseline recognition system is enhanced by modeling of articulations as sequences of syllables.  ...  Promising results are obtained for DTWM classification and ASR tests. We provide a discussion on the remaining problems in implementing dual stream speech recognition.  ...  The standard fix to this in syllable-based acoustic modeling has been back off to phoneme-based acoustic models.  ... 
doi:10.1007/s10772-010-9080-2 fatcat:lbv57lqs5najhibiqqebyyosje

Combining Dynamic Time Warping and Single Hidden Layer Feedforward Neural Networks for Temporal Sign Language Recognition

Ngoc Anh Nguyen Thi, Hyung-Jeong Yang, Sun-Hee Kim, Soo-Hyung Kim
2011 International Journal of Contents  
due to a very large sign sequences database.  ...  Temporal Sign Language Recognition (TSLR) from hand motion is an active area of gesture recognition research in facilitating efficient communication with deaf people.  ...  We first apply DTW on sign language sequences for performing the time alignment. We then use the latest machine learning ELM system for classifying based on complete warped features from DTW.  ... 
doi:10.5392/ijoc.2011.7.1.014 fatcat:3bplpd2fxndxpjpffgol6e3uu4

Multi-speaker/speaker-independent architectures for the multi-state time delay neural network

H. Hild, A. Waibel
1993 IEEE International Conference on Acoustics Speech and Signal Processing  
In this paper we present an improved Multi-State Time Delay Neural Network (MS-TDNN) for speakerindependent, connected letter recognition which outperforms an HMM based system (SPHINX) and previous MS-TDNNs  ...  In the "DTW Layer", each word to be recognized is modeled by a sequence of phonemes.  ...  In addition to the base-line system as introduced above, several techniques aimed at improving continuous recognition were used, including free alignment across word boundaries, word duration modeling  ... 
doi:10.1109/icassp.1993.319284 dblp:conf/icassp/HildW93 fatcat:xdsstwoq5nbf3gj5dciwmwq6ge

Using spoken words to guide open-ended category formation

Aneesh Chauhan, Luís Seabra Lopes
2011 Cognitive Processing  
difference between two phoneme sequences.  ...  .; s, and r and s are the number of phonemes in the phoneme sequence of W p and W q , respectively.  ... 
doi:10.1007/s10339-011-0407-y pmid:21614526 fatcat:xvdi25qxbrc2fmigegi6u3rdxm

Speech Analysis for Alphabets in Bangla Language: Automatic Speech Recognition

Asm SAYEM
2014 International Journal of Engineering Research  
Dynamic time warping (DTW) employed to calculate the distance of an unknown letter with the stored ones. K-nearest neighbors (KNN) algorithm is used to improve accuracy in noisy environment.  ...  Speech Recognition System: Speech is input via microphone and its analog waveform is digitized.  ...  And finally the output of DTW are inserted into the k-nearest neighbors (K-NN) based classifier for obtaining the word recognition performance.  ... 
doi:10.17950/ijer/v3s2/211 fatcat:ma3bhpkuuvaxrcqa3axixa52ve

Topic Identification for Speech without ASR [article]

Chunxi Liu, Jan Trmal, Matthew Wiesner, Craig Harman, Sanjeev Khudanpur
2017 arXiv   pre-print
Modern topic identification (topic ID) systems for speech use automatic speech recognition (ASR) to produce speech transcripts, and perform supervised classification on such ASR outputs.  ...  Moreover, using automatic phoneme-like tokenizations, we demonstrate that a convolutional neural network based framework for learning spoken document representations provides competitive performance compared  ...  Identify word repetitions via fast diagonal line search and segmental DTW. 3.  ... 
arXiv:1703.07476v2 fatcat:3nqqoqaiabcrvhx4dvpfx7pazq

Exemplar-Based Processing for Speech Recognition: An Overview

Tara Sainath, Bhuvana Ramabhadran, David Nahamoo, Dimitri Kanevsky, Dirk Compernolle, Kris Demuynck, Jort Gemmeke, Jerome Bellegarda, Shiva Sundaram
2012 IEEE Signal Processing Magazine  
S olving real-world classification and recognition problems requires a principled way of modeling the physical phenomena generating the observed data and the uncertainty in it.  ...  The goal of modeling is to establish a generalization from the set of observed data such that accurate inference (classification, decision, recognition) can be made about the data yet to be observed, which  ...  In addition, in template-based processing, dynamic time warping (DTW) is used to compare variable-length sequences of frames [11] , [26] .  ... 
doi:10.1109/msp.2012.2208663 fatcat:uscjurhrejgctasb6sc5t4paca

Spoken Content Retrieval—Beyond Cascading Speech Recognition with Text Retrieval

Lin-shan Lee, James Glass, Hung-yi Lee, Chun-an Chan
2015 IEEE/ACM Transactions on Audio Speech and Language Processing  
Spoken content retrieval refers to directly indexing and retrieving spoken content based on the audio rather than text descriptions.  ...  Spoken content retrieval has been very successfully achieved with the basic approach of cascading automatic speech recognition (ASR) with text information retrieval: after the spoken content is transcribed  ...  With this approach, the spoken content is first converted into word sequences or lattices via ASR.  ... 
doi:10.1109/taslp.2015.2438543 fatcat:hwrwmwtlkzfbfagox7bazu5r6a

Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery

Man-hung Siu, Herbert Gish, Arthur Chan, William Belfield, Steve Lowe
2014 Computer Speech and Language  
On the Switchboard corpus, the unsupervised HMM-based SOU recognizer, initialized with a segmental tokenizer, performed competitively with an HMM-based phoneme recognizer trained with 1 h of transcribed  ...  Specifically we propose building HMM-based speech recognizers without transcribed data by formulating the HMM training as an optimization over both the parameter and transcription sequence space.  ...  Similar to supervised STT where words are mapped to a sequence of phonemes via a dictionary, one can form bigger word-like units via a mapping to a sequence of basic units.  ... 
doi:10.1016/j.csl.2013.05.002 fatcat:7bwh437smbgcphmb42fosuicoe
« Previous Showing results 1 — 15 out of 316 results