An efficient keyword spotting technique using a complementary language for filler models training

Panikos Heracleous, Tohru Shimizu
2003 8th European Conference on Speech Communication and Technology (Eurospeech 2003)   unpublished
The task of keyword spotting is to detect a set of keywords in the input continuous speech. In a keyword spotter, not only the keywords, but also the non-keyword intervals must be modeled. For this purpose, filler (or garbage) models are used. To date, most of the keyword spotters have been based on hidden Markov models (HMM). More specifically, a set of HMM is used as garbage models. In this paper, a two-pass keyword spotting technique based on bilingual hidden Markov models is presented. In
more » ... e first pass, our technique uses phonemic garbage models to represent the nonkeyword intervals, and in the second stage the putative hits are verified using normalized scores. The main difference from similar approaches lies in the way the non-keyword intervals are modeled. In this work, the target language is Japanese, and English was chosen as the 'garbage' language for training the phonemic garbage models. Experimental results on both clean and noisy telephone speech data showed higher performance compared with using a common set of acoustic models. Moreover, parameter tuning (e.g. word insertion penalty tuning) does not have a serious effect on the performance. For a vocabulary of 100 keywords and using clean telephone speech test data we achieved a 92.04% recognition rate with only a 7.96% false alarm rate, and without word insertion penalty tuning. Using noisy telephone speech test data we achieved a 87.29% recognition rate with only a 12.71% false alarm rate.
doi:10.21437/eurospeech.2003-323 fatcat:2ns4u336zjhg5pqasut6zpmm2e