Cache-Augmented Latent Topic Language Models for Speech Retrieval

Jonathan Wintrode
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop  
We aim to improve speech retrieval performance by augmenting traditional N-gram language models with different types of topic context. We present a latent topic model framework that treats documents as arising from an underlying topic sequence combined with a cache-based repetition model. We analyze our proposed model both for its ability to capture word repetition via the cache and for its suitability as a language model for speech recognition and retrieval. We show this model, augmented with
more » ... he cache, captures intuitive repetition behavior across languages and exhibits lower perplexity than regular LDA on held out data in multiple languages. Lastly, we show that our joint model improves speech retrieval performance beyond N-grams or latent topics alone, when applied to a term detection task in all languages considered.
doi:10.3115/v1/n15-2001 dblp:conf/naacl/Wintrode15 fatcat:lq27omy3s5evxnz5xqya7dy37i