A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
Neural text generation models that are conditioned on a given input (e.g., machine translation and image captioning) are typically trained through maximum likelihood estimation of the target text. However, models trained in this manner often suffer from various types of errors when making subsequent inferences. In this study, we propose suppressing an arbitrary type of error by training the text generation model in a reinforcement learning framework; herein, we use a trainable reward functiondoi:10.5715/jnlp.28.751 fatcat:vlqbmsu7rnefzhbpepipixr62m