Arabic Text to Image Generation based on Generative Network of Fine-Grained Visual Descriptions

S.M. Salem, M.L. Ramadan
2020 Benha Journal of Applied Sciences  
Converting natural language text descriptions into images is a challenging problem in computer vision and has many practical applications. Text-image is not different from language translation problems. In the same way similar semantics can be encoded in two different languages, images and text are two different languages to encode related information. None the less, these problems are totally different because text-image or image-text conversions are highly multimodal problems. In this paper,
more » ... e propose our model for Arabic text description that allows multi-stage, attention-driven for refinement for fine-grained Arabic text-to-image generation. With a modern attentional generative network, the Attentional model enable to synthesize fine-grained details at different sub-regions of the image by paying attentions to the related words in the natural Arabic language description. We train the model from scratch to Modified-Arabic dataset. The important term in our Network is a word level fine-grained image-text matching loss computed by the Deep Attentional Multimodal Similarity Model (DAMSM). The DAMSM learns two main neural networks that map sub-regions of the image and Arabic words of the sentence to a common semantic space. Our model achieves strong performance on Arabic-text encoder and image encoder, it is characterized by ease and accuracy in description the images on the Caltech-UCSD Birds 200-2011 dataset.
doi:10.21608/bjas.2020.226901 fatcat:ju2bjtzg2fgovafii5uklda4oa