Unsupervised Vocabulary Adaptation for Morph-based Language Models

André Mansikkaniemi, Mikko Kurimo
2012 North American Chapter of the Association for Computational Linguistics  
Modeling of foreign entity names is an important unsolved problem in morpheme-based modeling that is common in morphologically rich languages. In this paper we present an unsupervised vocabulary adaptation method for morph-based speech recognition. Foreign word candidates are detected automatically from in-domain text through the use of letter n-gram perplexity. Over-segmented foreign entity names are restored to their base forms in the morph-segmented in-domain text for easier and more
more » ... modeling and recognition. The adapted pronunciation rules are finally generated with a trainable grapheme-tophoneme converter. In ASR performance the unsupervised method almost matches the ability of supervised adaptation in correctly recognizing foreign entity names.
dblp:conf/naacl/MansikkaniemiK12 fatcat:7yeb5bpttzfk3dfxn6gywodqxi