Static interpolation of exponential n-gram models using features of features

Abhinav Sethy, Stanley Chen, Bhuvana Ramabhadran, Paul Vozila
2014 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
The best language model performance for a task is often achieved by interpolating language models built separately on corpora from multiple sources. While common practice is to use a single set of fixed interpolation weights to combine models, past work has found that gains can be had by allowing weights to vary by n-gram, when linearly interpolating word n-gram models. In this work, we investigate whether similar ideas can be used to improve log-linear interpolation for Model M, an exponential
more » ... class-based n-gram model with state-of-the-art performance. We focus on log-linear interpolation as Model M's combined via (regular) linear interpolation cannot be statically compiled into a single model, as is required for many applications due to resource constraints. We present a general parameter interpolation framework in which a weight prediction model is used to compute the interpolation weights for each n-gram. The weight prediction model takes a rich representation of n-gram features as input, and is trained to optimize the perplexity of a held-out set. In experiments on Broadcast News, we show that a mixture of experts weight prediction model yields significant perplexity and word-error rate improvements as compared to static linear interpolation.
doi:10.1109/icassp.2014.6854529 dblp:conf/icassp/SethyCRV14 fatcat:ruwktt5i6nf5nb6555jkg3dvei