A comparison-based approach to mispronunciation detection

Ann Lee, James Glass
2012 2012 IEEE Spoken Language Technology Workshop (SLT)  
The task of mispronunciation detection for language learning is typically accomplished via automatic speech recognition (ASR). Unfortunately, less than 2% of the world's languages have an ASR capability, and the conventional process of creating an ASR system requires large quantities of expensive, annotated data. In this paper we report on our efforts to develop a comparison-based framework for detecting word-level mispronunciations in nonnative speech. Dynamic time warping (DTW) is carried out
more » ... between a student's (nonnative speaker) utterance and a teacher's (native speaker) utterance, and we focus on extracting word-level and phonelevel features that describe the degree of mis-alignment in the warping path and the distance matrix. Experimental results on a Chinese University of Hong Kong (CUHK) nonnative corpus show that the proposed framework improves the relative performance on a mispronounced word detection task by nearly 50% compared to an approach that only considers DTW alignment scores.
doi:10.1109/slt.2012.6424254 dblp:conf/slt/LeeG12 fatcat:ji25wncfybanpeq5m5ecr27ice