Feedback evaluations to promote image captioningFeedback evaluations to promote image captioning
IET Image Processing
Image captioning can be treated as a policy gradient problem. A retrieval model to obtain the discriminability score to distinguish between two images, given the caption for one of them, has been proposed previously; the discriminability score and one of the image captioning evaluation metrics were optimised using policy gradient. Based on this, two methods to evaluate the caption and caption-generating process, referred to as feedback evaluations, are proposed in this study. The results of the
... evaluations were used to improve the model. First, an auxiliary retrieval loss (ARL) is introduced to evaluate the generated caption to improve the discriminability of the model. ARL has been utilised as a feedback evaluation method because it calculates similarity between the generated caption and convolutional neural network features. With ARL, a higher similarity and better discriminability were achieved. Second, an evaluation reward is introduced to evaluate the captioning process. With ER, the overall evaluation metrics can be improved. A policy gradient was used, and a captioning model could be trained by jointly adjusting the captioning process and captioning itself. The attention long short-term memory network was trained with ARL and ER successively and it demonstrated state-of-the-art performance on the COCO database.