Statistical and Linguistic Clustering for Language Modeling in ASR [chapter]

R. Justo, I. Torres
2005 Lecture Notes in Computer Science  
In this work several sets of categories obtained by a statistical clustering algorithm, as well as a linguistic set, were used to design category-based language models. The language models proposed were evaluated, as usual, in terms of perplexity of the text corpus. Then they were integrated into an ASR system and also evaluated in terms of system performance. It can be seen that category-based language models can perform better, also in terms of WER, when categories are obtained through
more » ... ical models instead of using linguistic techniques. They also show that better system performance are obtained when the language model interpolates category based and word based models.
doi:10.1007/11578079_58 fatcat:etksbknf4nculoand2rfsacwqy