Re-estimation of lexical parameters for treebank PCFGs

Tejaswini Deoskar
2008 Proceedings of the 22nd International Conference on Computational Linguistics - COLING '08   unpublished
We present procedures which pool lexical information estimated from unlabeled data via the Inside-Outside algorithm, with lexical information from a treebank PCFG. The procedures produce substantial improvements (up to 31.6% error reduction) on the task of determining subcategorization frames of novel verbs, relative to a smoothed Penn Treebank-trained PCFG. Even with relatively small quantities of unlabeled training data, the re-estimated models show promising improvements in labeled
more » ... f-scores on Wall Street Journal parsing, and substantial benefit in acquiring the subcategorization preferences of low-frequency verbs.
doi:10.3115/1599081.1599106 fatcat:3ekcxxu4enc5vipm3s4ym4vrc4