Learning to Generate Textual Data

Guillaume Bouchard, Pontus Stenetorp, Sebastian Riedel
2016 Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing  
To learn text understanding models with millions of parameters one needs massive amounts of data. In this work, we argue that generating data can compensate for this need. While defining generic data generators is difficult, we propose to allow generators to be "weakly" specified in the sense that a set of parameters controls how the data is generated. Consider for example generators where the example templates, grammar, and/or vocabulary is determined by this set of parameters. Instead of
more » ... lly tuning these parameters, we learn them from the limited training data at our disposal. To achieve this, we derive an efficient algorithm called GENERE that jointly estimates the parameters of the model and the undetermined generation parameters. We illustrate its benefits by learning to solve math exam questions using a highly parametrized sequence-to-sequence neural network.
doi:10.18653/v1/d16-1167 dblp:conf/emnlp/BouchardSR16 fatcat:3vx6pni355egpbnjh33yzqydkq