Measuring the Influence of Long Range Dependencies with Neural Network Language Models

Hai Son Le, Alexandre Allauzen, François Yvon
2012 North American Chapter of the Association for Computational Linguistics  
In spite of their well known limitations, most notably their use of very local contexts, n-gram language models remain an essential component of many Natural Language Processing applications, such as Automatic Speech Recognition or Statistical Machine Translation. This paper investigates the potential of language models using larger context windows comprising up to the 9 previous words. This study is made possible by the development of several novel Neural Network Language Model architectures,
more » ... hich can easily fare with such large context windows. We experimentally observed that extending the context size yields clear gains in terms of perplexity and that the n-gram assumption is statistically reasonable as long as n is sufficiently high, and that efforts should be focused on improving the estimation procedures for such large models.
dblp:conf/naacl/LeAY12a fatcat:4m4yq4qy2zhmxoqhxklsfuk4ym