Image Captioning Algorithm Based on Multi-Branch CNN and Bi-LSTM

Shan HE, Yuanyao LU, Shengnan CHEN
2021 IEICE transactions on information and systems  
The development of deep learning and neural networks has brought broad prospects to computer vision and natural language processing. The image captioning task combines cutting-edge methods in two fields. By building an end-to-end encoder-decoder model, its description performance can be greatly improved. In this paper, the multi-branch deep convolutional neural network is used as the encoder to extract image features, and the recurrent neural network is used to generate descriptive text that
more » ... ches the input image. We conducted experiments on Flickr8k, Flickr30k and MSCOCO datasets. According to the analysis of the experimental results on evaluation metrics, the model proposed in this paper can effectively achieve image caption, and its performance is better than classic image captioning models such as neural image annotation models. key words: image captioning, multi-branch CNN, Bi-LSTM, encoderdecoder
doi:10.1587/transinf.2020edp7227 fatcat:45tq63dubffjrey47fkrlk6y6u