249 Hits in 6.0 sec

Bilinear Representation for Language-based Image Editing Using Conditional Generative Adversarial Networks [article]

Xiaofeng Mao, Yuefeng Chen, Yuhong Li, Tao Xiong, Yuan He, Hui Xue
2019 arXiv   pre-print
The task of Language-Based Image Editing (LBIE) aims at generating a target image by editing the source image based on the given language description.  ...  Therefore, the editing performance is heavily dependent on the learned representation. In this work, conditional generative adversarial network (cGAN) is utilized for LBIE.  ...  In this work, we first theoretically analyse these works which edit the image based on fused visual-text representations using different conditioning methods.  ... 
arXiv:1903.07499v1 fatcat:t7okn2yyyjhbxiydcq3wjakgku

Image Modification using Text with GANs

Fenil Doshi, Parth Doshi, Jimit Gandhi, Khushmann Dwivedi, Dr. Ramchandra Mangrulkar
2020 International Journal of Computer Applications Technology and Research  
This paper works towards finding an effective solution for the task of image feature manipulation using natural language commands.  ...  Majority of the research in this domain focuses on generating completely new images using natural language description, and the few methodologies which attempt manipulation of existing images super in  ...  Generative Adversarial Networks (GANs) [4]: The authors use a variant of GANs known as conditional GANs where the generation of data samples is conditioned on some external feature.  ... 
doi:10.7753/ijcatr0911.1001 fatcat:ujisbfkwmjec5gpnn3l7qoirn4

Multimodal Intelligence: Representation Learning, Information Fusion, and Applications [article]

Chao Zhang, Zichao Yang, Xiaodong He, Li Deng
2020 arXiv   pre-print
Regarding applications, selected areas of a broad interest in the current literature are covered, including image-to-text caption generation, text-to-image generation, and visual question answering.  ...  Regarding multimodal fusion, this review focuses on special architectures for the integration of representations of unimodal signals for a particular task.  ...  ACKNOWLEDGEMENT The authors are grateful to the editor and anonymous reviewers for their valuable suggestions that helped to make this paper better.  ... 
arXiv:1911.03977v3 fatcat:ojazuw3qzvfqrdweul6qdpxuo4

Learning by Planning: Language-Guided Global Image Editing [article]

Jing Shi, Ning Xu, Yihang Xu, Trung Bui, Franck Dernoncourt, Chenliang Xu
2021 arXiv   pre-print
Recently, language-guided global image editing draws increasing attention with growing application potentials.  ...  Hence, we propose a novel operation planning algorithm to generate possible editing sequences from the target image as pseudo ground truth.  ...  Related Work Language-based image editing. Language-based image editing tasks can be categorized into one-turn and multiturn editing.  ... 
arXiv:2106.13156v1 fatcat:m7xc5urrbreo7da4s2c7jnl364

Scribbler: Controlling Deep Image Synthesis with Sketch and Color [article]

Patsorn Sangkloy, Jingwan Lu, Chen Fang, Fisher Yu, James Hays
2016 arXiv   pre-print
In this paper, we propose a deep adversarial image synthesis architecture that is conditioned on sketched boundaries and sparse color strokes to generate realistic cars, bedrooms, or faces.  ...  We demonstrate a sketch based image synthesis system which allows users to 'scribble' over the sketch to indicate preferred color for objects.  ...  Acknowledgments We thank Yijun Li for assistance with generation of synthetic training sketches from [11] .  ... 
arXiv:1612.00835v2 fatcat:dq4wxxj2xjc7lcofgnkfnxvk7e

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

Wei Chen, Weiping Wang, Li Liu, Michael S. Lew
2020 Neurocomputing  
, including auto-encoders, generative adversarial nets and their variants.  ...  These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering  ...  Generative adversarial networks As depicted in Fig. 4 , adversarial learning from generative adversarial networks [22] has been employed into applications including image captioning [28, 121, 123]  ... 
doi:10.1016/j.neucom.2020.10.042 fatcat:hyjkj5enozfrvgzxy6avtbmoxu

New Ideas and Trends in Deep Multimodal Content Understanding: A Review [article]

Wei Chen and Weiping Wang and Li Liu and Michael S. Lew
2020 arXiv   pre-print
, including auto-encoders, generative adversarial nets and their variants.  ...  These models go beyond the simple image classifiers in which they can do uni-directional (e.g. image captioning, image generation) and bi-directional (e.g. cross-modal retrieval, visual question answering  ...  We appreciate the helpful editing work from Dr. Erwin Bakker.  ... 
arXiv:2010.08189v1 fatcat:2l7molbcn5hf3oyhe3l52tdwra

Image Manipulation with Natural Language using Two-sidedAttentive Conditional Generative Adversarial Network [article]

Dawei Zhu, Aditya Mogadala, Dietrich Klakow
2019 arXiv   pre-print
TEA-cGAN uses fine-grained attention both in the generator and discriminator of Generative Adversarial Network (GAN) based framework at different scales.  ...  We propose the Two-sidEd Attentive conditional Generative Adversarial Network (TEA-cGAN) to generate semantically manipulated images while preserving other contents such as background intact.  ...  To the best of our knowledge, none of the previous works propose attention over conditional Generative Adversarial Network (cGAN) in a generator for fine-grained image manipulation with natural language  ... 
arXiv:1912.07478v1 fatcat:kwk6wafaanfstafp27ozzszi64

A comprehensive survey on semantic facial attribute editing using generative adversarial networks [article]

Ahmad Nickabadi, Maryam Saeedi Fard, Nastaran Moradzadeh Farid, Najmeh Mohammadbagheri
2022 arXiv   pre-print
Generating random photo-realistic images has experienced tremendous growth during the past few years due to the advances of the deep convolutional neural networks and generative models.  ...  Based on their architectures, the state-of-the-art models are categorized and studied as encoder-decoder, image-to-image, and photo-guided models.  ...  [203] have proposed an adversarially regularized U-net (ARU-net)-based generative adversarial networks (ARU-GANs) for facial attribute generation and modification.  ... 
arXiv:2205.10587v1 fatcat:thpe4crcgndifb5mhtuveww4ji

A Review on Explainability in Multimodal Deep Neural Nets

Gargi Joshi, Rahee Walambe, Ketan Kotecha
2021 IEEE Access  
Several topics on multimodal AI and its applications for generic domains have been covered in this paper, including the significance, datasets, fundamental building blocks of the methods and techniques  ...  This paper extensively reviews the present literature to present a comprehensive survey and commentary on the explainability in multimodal deep neural nets, especially for the vision and language tasks  ...  Adversarial examples can also be used for understanding neural networks.  ... 
doi:10.1109/access.2021.3070212 fatcat:5wtxr4nf7rbshk5zx7lzbtcram

Deep multimodal representation learning: a survey

Wenzhong Guo, Jianwen Wang, Shiping Wanga
2019 IEEE Access  
This paper highlights on the key issues of newly developed technologies, such as encoder-decoder model, generative adversarial networks, and attention mechanism in a multimodal representation learning  ...  Due to the powerful representation ability with multiple levels of abstraction, deep learning-based multimodal representation learning has attracted much attention in recent years.  ...  [23] alternatively propose to utilize Multimodal Compact Bilinear pooling (MCB) to fuse language and image modalities.  ... 
doi:10.1109/access.2019.2916887 fatcat:ms4wcgl5rncsbiywz27uss4ysq

GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields [article]

Michael Niemeyer, Andreas Geiger
2021 arXiv   pre-print
Deep generative models allow for photorealistic image synthesis at high resolutions. But for many applications, this is not enough: content creation also needs to be controllable.  ...  Our key hypothesis is that incorporating a compositional 3D scene representation into the generative model leads to more controllable image synthesis.  ...  Generative adversarial networks for image rium.  ... 
arXiv:2011.12100v2 fatcat:bziey2wodzgknfznl3frlbnoii

Language-Based Image Editing with Recurrent Attentive Models [article]

Jianbo Chen, Yelong Shen, Jianfeng Gao, Jingjing Liu, Xiaodong Liu
2018 arXiv   pre-print
We propose a generic modeling framework for two sub-tasks of LBIE: language-based image segmentation and image colorization.  ...  Given a source image and a natural language description, we want to generate a target image by editing the source image based on the description.  ...  Conditional GANs in image generation Generative adversarial networks (GANs) [6] have been widely used for image generation.  ... 
arXiv:1711.06288v2 fatcat:lurma7w6zvcirnhbgnikgdbv6i

Multimodal Research in Vision and Language: A Review of Current and Emerging Trends [article]

Shagun Uppal, Sarthak Bhagat, Devamanyu Hazarika, Navonil Majumdar, Soujanya Poria, Roger Zimmermann, Amir Zadeh
2020 arXiv   pre-print
We look at its applications in their task formulations and how to solve various problems related to semantic perception and content generation.  ...  In this paper, we present a detailed overview of the latest trends in research pertaining to visual and language modalities.  ...  For the tasks with visual outputs, R-precision [340] was introduced for retrieval-based algorithms, later used for language-to-image generation task.  ... 
arXiv:2010.09522v2 fatcat:l4npstkoqndhzn6hznr7eeys4u

Generating Compositional Color Representations from Text [article]

Paridhi Maheshwari, Nihal Jain, Praneetha Vaddamanu, Dhananjay Raut, Shraiysh Vaishay, Vishwa Vinay
2021 arXiv   pre-print
Motivated by the fact that a significant fraction of user queries on an image search engine follow an (attribute, object) structure, we propose a generative adversarial network that generates color profiles  ...  We consider the cross-modal task of producing color representations for text phrases.  ...  Figure 5 : 5 Generative adversarial network for learning color representations of (attribute, object) text phrases.  ... 
arXiv:2109.10477v1 fatcat:iqjwbngzs5hdpbry65tqntidz4
« Previous Showing results 1 — 15 out of 249 results