Semantically Consistent Hierarchical Text to Fashion Image Synthesis with an Enhanced-Attentional Generative Adversarial Network

Kenan Emir Ak, Joo Hwee Lim, Jo Yew Tham, Ashraf Kassim
2019 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW)  
In this paper, we present the enhanced Attentional Generative Adversarial Network (e-AttnGAN) with improved training stability for text-to-image synthesis. e-AttnGAN's integrated attention module utilizes both sentence and word context features and performs feature-wise linear modulation (FiLM) to fuse visual and natural language representations. In addition to multimodal similarity learning for text and image features of AttnGAN [28], cosine and feature matching losses of real and generated
more » ... ges are included while employing a classification loss for "significant attributes". In order to improve the stability of the training and solve the issue of model collapse, spectral normalization and two-time scale update for the discriminator are used together with instance noise. Our experiments show that e-AttnGAN outperforms state-of-the-art methods on the FashionGen and DeepFashion-Synthesis datasets.
doi:10.1109/iccvw.2019.00379 dblp:conf/iccvw/AkLTK19 fatcat:cit4iochkzezrg6lyb6mgaky5i