Hallucinated n-best lists for discriminative language modeling

K. Sagae, M. Lehr, E. Prud'hommeaux, P. Xu, N. Glenn, D. Karakos, S. Khudanpur, B. Roark, M. Saraclar, I. Shafran, D. Bikel, C. Callison-Burch (+7 others)
2012 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)  
This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with "real" n-best list output from the baseline recognizer. We find that methods based on
more » ... racting phrasal cohortssimilar to methods from machine translation for extracting phrase tables -yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.
doi:10.1109/icassp.2012.6289043 dblp:conf/icassp/SagaeLPXGKKRSSBCCHHKLPR12 fatcat:emi5pcwldrgdrlvi26wtyh6aqy