2,254 Hits in 4.4 sec

Semantic Layout Manipulation with High-Resolution Sparse Attention [article]

Haitian Zheng, Zhe Lin, Jingwan Lu, Scott Cohen, Jianming Zhang, Ning Xu, Jiebo Luo
2022 arXiv   pre-print
To adapt this paradigm for the layout manipulation task, we propose a high-resolution sparse attention module that effectively transfers visual details to new layouts at a resolution up to 512x512.  ...  We tackle the problem of semantic image layout manipulation, which aims to manipulate an input image by editing its semantic label map.  ...  High-resolution Sparse Attention Image correspondence is often sparse [53] , [54] , meaning that for an image pair A, B, a pixel A often matches only with a sparse set of pixels from B.  ... 
arXiv:2012.07288v4 fatcat:3wx7o3z7azgmrigu4o6fygndo4

Controlling Style and Semantics in Weakly-Supervised Image Generation [article]

Dario Pavllo, Aurelien Lucchi, Thomas Hofmann
2020 arXiv   pre-print
In order to condition our model on textual descriptions, we introduce a semantic attention module whose computational cost is independent of the image resolution.  ...  We exploit sparse semantic maps to control object shapes and classes, as well as textual descriptions or attributes to control both local and global style.  ...  high-resolution settings.  ... 
arXiv:1912.03161v2 fatcat:gcl26aptnzhd3mmhnhu5tfrfoa

Fashion Editing with Adversarial Parsing Learning [article]

Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Zhenyu Xie, Bowen Wu, Ziqi Zhang, Xiaohui Shen, Jian Yin
2019 arXiv   pre-print
Extensive experiments on high-resolution fashion image datasets demonstrate that the proposed method significantly outperforms the state-of-the-art methods on image manipulation.  ...  textures with semantic guidance from the human parsing map.  ...  Introduction Fashion image manipulation aims to generate high-resolution realistic fashion images with userprovided sketches and color strokes. It has huge potential values in various applications.  ... 
arXiv:1906.00884v2 fatcat:3i2kxaal7rh65dh7kw6absgila

Fashion Editing With Adversarial Parsing Learning

Haoye Dong, Xiaodan Liang, Yixuan Zhang, Xujie Zhang, Xiaohui Shen, Zhenyu Xie, Bowen Wu, Jian Yin
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Extensive experiments on high-resolution fashion image datasets demonstrate that the proposed FE-GAN significantly outperforms the state-of-the-art methods on fashion image manipulation.  ...  textures with semantic guidance from the human parsing map.  ...  multi-scale attention normalization layers, which can generate high-resolution realistic edited fashion images.  ... 
doi:10.1109/cvpr42600.2020.00814 dblp:conf/cvpr/DongLZZSXW020 fatcat:kre4h64vc5dohczeiboc3unlyi

Person-in-Context Synthesiswith Compositional Structural Space [article]

Weidong Yin, Ziwei Liu, Leonid Sigal
2020 arXiv   pre-print
The context is specified by the bounding box object layout which lacks shape information, while pose of the person(s) by keypoints which are sparsely annotated.  ...  Despite significant progress, controlled generation of complex images with interacting people remains difficult.  ...  As can be seen we can generate complex images with multiple objects at high resolution and with realistic details.  ... 
arXiv:2008.12679v1 fatcat:nwksbmdsc5g73b45h7xsdwdthm

Adversarial Text-to-Image Synthesis: A Review [article]

Stanislav Frolov, Tobias Hinz, Federico Raue, Jörn Hees, Andreas Dengel
2021 arXiv   pre-print
However, the field still faces several challenges that require further research efforts such as enabling the generation of high-resolution images with multiple objects, and developing suitable and reliable  ...  It is a flexible and intuitive way for conditional image generation with significant progress in the last years regarding visual realism, diversity, and semantic alignment.  ...  The generator of PPAN applies a pyramid framework [44, 45] to combine low-resolution, semantically strong features with high-resolution, seman- StackGAN [33] and StackGAN++ [40] architectures.  ... 
arXiv:2101.09983v1 fatcat:as5i4mk4kndrzpcshlewkbgge4

Learning to Predict Layout-to-image Conditional Convolutions for Semantic Image Synthesis [article]

Xihui Liu, Guojun Yin, Jing Shao, Xiaogang Wang, Hongsheng Li
2020 arXiv   pre-print
Semantic image synthesis aims at generating photorealistic images from semantic layouts.  ...  Previous approaches with conditional generative adversarial networks (GAN) show state-of-the-art performance on this task, which either feed the semantic label maps as inputs to the generator, or use them  ...  Conclusion We propose a novel approach (CC-FPSE) for image synthesis from a given semantic layout via better using the semantic layout information to generate images with high-quality details and well  ... 
arXiv:1910.06809v3 fatcat:cttcvpisnbeufe3dt6myawin3i

A Unified Framework for Biphasic Facial Age Translation with Noisy-Semantic Guided Generative Adversarial Networks [article]

Muyi Sun, Jian Wang, Yunfan Liu, Qi Li, Zhenan Sun
2021 arXiv   pre-print
ProjectionNet introduces the low-level structural semantic information with noise map and produces soft latent maps.  ...  Structurally, we project the class-aware noisy semantic layouts to soft latent maps for the following injection operation on the individual facial parts.  ...  SLMNet [25] proposes a high-resolution sparse attention module that effectively transfers visual details to 'new' layouts at a resolution up to 512 × 512.  ... 
arXiv:2109.07373v1 fatcat:fo424mxpxrbh3dmimvsx5grjwq

InteriorNet: Mega-scale Multi-sensor Photo-realistic Indoor Scenes Dataset [article]

Wenbin Li , Dimos Tzoumanikas, Stefan Leutenegger Department of Computing, Imperial College London, London UK, SW7 2AZ
2018 arXiv   pre-print
Our dataset leverages the availability of millions of professional interior designs and millions of production-level furniture and object assets -- all coming with fine geometric details and high-resolution  ...  We render high-resolution and high frame-rate video sequences following realistic trajectories while supporting various camera types as well as providing inertial measurements.  ...  Acknowledgements We would like to thank for providing their database of production furniture models and layouts, as well as access to their GPU/CPU clusters.  ... 
arXiv:1809.00716v1 fatcat:xcav5mdu25gezjdzulqx252d6y

Video2StyleGAN: Encoding Video in Latent Space for Manipulation [article]

Jiyang Yu, Jingen Liu, Jing Huang, Wei Zhang, Tao Mei
2022 arXiv   pre-print
Based on the vision transformer, our network reuses the high-resolution portion of the latent vector to enforce temporal consistency.  ...  To this end, we propose a novel network to encode face videos into the latent space of StyleGAN for semantic face video manipulation.  ...  The lower-resolution layers control the high-level layout/shape of the face, while the higher-resolution layers control the face details. The space of these vectors is denoted as W+ ∈ R 18×512 .  ... 
arXiv:2206.13078v1 fatcat:bfbvwxtdf5ftnlq2in4malzov4

Depth-SIMS: Semi-Parametric Image and Depth Synthesis [article]

Valentina Musat, Daniele De Martini, Matthew Gadd, Paul Newman
2022 arXiv   pre-print
the RGB canvases into high quality RGB images and the sparse depth maps into pixel-wise dense depth maps.  ...  In this paper we present a compositing image synthesis method that generates RGB canvases with well aligned segmentation maps and sparse depth maps, coupled with an in-painting network that transforms  ...  We then sample locations from the validity mask M valid depth , and produce a sparse depth map D sparse aligned with depth information taken from the depth map D guide of the guiding semantic layout S  ... 
arXiv:2203.03405v2 fatcat:d72gw6g2h5f45fikoyfxdw5cwu

Semantic-Preserving Word Clouds by Seam Carving

Yingcai Wu, Thomas Provan, Furu Wei, Shixia Liu, Kwan-Liu Ma
2011 Computer graphics forum (Print)  
With seam carving, we can pack the word cloud compactly and effectively, while preserving its overall semantic structure.  ...  Word clouds are proliferating on the Internet and have received much attention in visual analytics.  ...  The created word layout is a preliminary and sparse layout for showing the overall semantic relations among all the extracted keywords.  ... 
doi:10.1111/j.1467-8659.2011.01923.x fatcat:llkpskliurdydcss5jjxwrw4fi

STEEX: Steering Counterfactual Explanations with Semantics [article]

Paul Jacob, Éloi Zablocki, Hédi Ben-Younes, Mickaël Chen, Patrick Pérez, Matthieu Cord
2022 arXiv   pre-print
Leveraging recent semantic-to-image models, we propose a new generative counterfactual explanation framework that produces plausible and sparse modifications which preserve the overall scene structure.  ...  In this work, we address the problem of producing counterfactual explanations for high-quality images and complex scenes.  ...  CelebAMask-HQ contains 30,000 high-quality face portraits with semantic segmentation annotation maps including 19 semantic classes (e.g., skin, mouth nose, etc.).  ... 
arXiv:2111.09094v3 fatcat:ecl5powswjdb3fwlqzma6476yu

Spatially Multi-conditional Image Generation [article]

Ritika Chakraborty, Nikola Popovic, Danda Pani Paudel, Thomas Probst, Luc Van Gool
2022 arXiv   pre-print
Our choice of spatial conditioning, such as by semantics and depth, is driven by the promise it holds for better control of the image generation process.  ...  The entire dataset consists of over 4 Million images from around 500 different buildings with high resolution RGB images, segmentation masks and other labels.  ...  This model first synthesizes the high-resolution pixels using lightweight and highly parallelizable operators. ASAP performs most computationally expensive image analysis at a very coarse resolution.  ... 
arXiv:2203.13812v2 fatcat:6xjovlkaerf2vb5dtcki7epet4

Deep Image Synthesis from Intuitive User Input: A Review and Perspectives [article]

Yuan Xue, Yuan-Chen Guo, Han Zhang, Tao Xu, Song-Hai Zhang, Xiaolei Huang
2021 arXiv   pre-print
In many applications of computer graphics, art and design, it is desirable for a user to provide intuitive non-image input, such as text, sketch, stroke, graph or layout, and have a computer system automatically  ...  The CelebA-HQ dataset [49] consists of 30,000 high resolution images from the CelebA dataset.  ...  To improve synthesis quality, they proposed a Cascaded Refinement Network (CRN), which progressively generates images from low resolution to high resolution (up to 2 megapixels at 1024x2048 pixel resolution  ... 
arXiv:2107.04240v2 fatcat:ticrsi27nzhozmw7dp7wwja2ni
« Previous Showing results 1 — 15 out of 2,254 results