Semantic n-gram language modeling with the latent maximum entropy principle

Shaojun Wang, D. Schuurmans, Fuchun Peng, Yunxin Zhao
2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).  
In this paper, we describe a unified probabilistic framework for statistical language modeling-the latent maximum entropy principle-which can effectively incorporate various aspects of natural language, such as local word interaction, syntactic structure and semantic document information. Unlike previous work on maximum entropy methods for language modeling, which only allow explicit features to be modeled, our framework also allows relationships over hidden features to be captured, resulting
more » ... a more expressive language model. We describe efficient algorithms for marginalization, inference and normalization in our extended models. We then present experimental results for our approach on the Wall Street Journal corpus.
doi:10.1109/icassp.2003.1198796 dblp:conf/icassp/WangSPZ03 fatcat:c3qdgkeewncm5lhez76pgnoyhu