Filters








220 Hits in 4.1 sec

Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding [article]

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho (+2 others)
2022 arXiv   pre-print
We present Imagen, a text-to-image diffusion model with an unprecedented degree of photorealism and a deep level of language understanding.  ...  Imagen builds on the power of large transformer language models in understanding text and hinges on the strength of diffusion models in high-fidelity image generation.  ...  We thank Aditya Ramesh, Prafulla Dhariwal, and Alex Nichol for allowing us to use DALL-E 2 samples and providing us with GLIDE samples.  ... 
arXiv:2205.11487v1 fatcat:bn2oc6ypufddpd7lysh7sc32gu

Discovering the Hidden Vocabulary of DALLE-2 [article]

Giannis Daras, Alexandros G. Dimakis
2022 arXiv   pre-print
We discover that DALLE-2 seems to have a hidden vocabulary that can be used to generate images with absurd prompts. For example, it seems that means birds and (sometimes) means bugs or pests.  ...  We present our black-box method to discover words that seem random but have some correspondence to visual concepts. This creates important security and interpretability challenges.  ...  We would like to thank Ludwig Schmidt, Rachael Tatman and others on Twitter who provided constructive feedback. We also thank OpenAI for providing access to their model through the API.  ... 
arXiv:2206.00169v1 fatcat:me7jhclfj5aynkbvp4hl2simqa

Deep Learning and Synthetic Media [article]

Raphaël Millière
2022 arXiv   pre-print
, and can be indistinguishable from real sounds and images recorded with a sensor.  ...  Synthetic audiovisual media generated with deep learning - often subsumed colloquially under the label "deepfakes" - have a number of impressive characteristics; they are increasingly trivial to produce  ...  It has also become increasingly easy to guide image generation directly with text.  ... 
arXiv:2205.05764v1 fatcat:6th5uy6zifgydj4cbecvs3fw4u

IEEE Access Special Section Editorial: Big Data Learning and Discovery

Zhong-Ke Gao, An-An Liu, Yanhui Wang, Michael Small, Xiaojun Chang, Jurgen Kurths
2021 IEEE Access  
In [A64] , Li and Ye proposed a remote sensing image scene understanding method based on deep learning.  ...  trading patterns modeling approach with deep learning in stock trend prediction.  ...  2001 to 2011.  ... 
doi:10.1109/access.2021.3127335 fatcat:apph47tuffblnp2dkcc7lezffy

Neural style-preserving visual dubbing

Hyeongwoo Kim, Mohamed Elgharib, Michael Zollhöfer, Hans-Peter Seidel, Thabo Beeler, Christian Richardt, Christian Theobalt
2019 ACM Transactions on Graphics  
We train our model with unsynchronized source and target videos in an unsupervised manner using cycle-consistency and mouth expression losses, and synthesize photorealistic video frames using a layered  ...  Dubbing is a technique for translating video content from one language to another.  ...  ACKNOWLEDGMENTS We are grateful to all our actors and the reviewers for their valuable feedback.  ... 
doi:10.1145/3355089.3356500 fatcat:dmtkieupszhvbmab466hxlmjyu

VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance [article]

Katherine Crowson and Stella Biderman and Daniel Kornis and Dashiell Stander and Eric Hallahan and Louis Castricato and Edward Raff
2022 arXiv   pre-print
Generating and editing images from open domain text prompts is a challenging task that heretofore has required expensive and specially trained models.  ...  encoder to guide image generations.  ...  We are also indebted to the thousands of people who have used and built upon our methods and given us feedback that has allowed us to continue to improve our techniques.  ... 
arXiv:2204.08583v1 fatcat:tsfe7nozlvgejg2m7lrxlugjta

AI in the media and creative industries [article]

Giuseppe Amato, Malte Behrmann, Frédéric Bimbot , Ander Garcia, Joost Geurts, Jaume Gibert, Guillaume Gravier , Antoine Liutkus, Andrew Perkis , Emmanuel Vincent
2019 arXiv   pre-print
, as opposed to the conventional "Big Data" approach, or the ability to process, analyse and match data from multiple modalities (text, sound, images, etc.) at the same time.  ...  The purpose of this white paper is to understand future technological advances in AI and their growing impact on creative industries.  ...  Extrapolating from the French market, we can reasonably estimate that the annual market for the production of subtitles adapted to audiovisual broadcasters at a global level of several hundreds of millions  ... 
arXiv:1905.04175v1 fatcat:r6w6bord75flli72j5vpv3vvky

Text Data Augmentation for Deep Learning

Connor Shorten, Taghi M. Khoshgoftaar, Borko Furht
2021 Journal of Big Data  
AbstractNatural Language Processing (NLP) is one of the most captivating applications of Deep Learning.  ...  We follow these motifs with a concrete list of augmentation frameworks that have been developed for text data.  ...  Acknowledgements We would like to thank the reviewers in the Data Mining and Machine Learning Laboratory at Florida Atlantic University.  ... 
doi:10.1186/s40537-021-00492-0 fatcat:bcbaqkpicnd6dcwc34pdijosby

Cultural heritage conservation and communication by digital modeling tools. Case studies: minor architectures of the Thirties in the Turin area

A. Bruno Jr., R. Spallone
2015 ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences  
The modeling skills are the basis to product videos able to explore the relationship between the environment and "re-built architectures", describing with the synthetic movie techniques, the main formal  ...  The model represents a scientific product that can be involved in a virtual archive of cultural goods to preserve the collective memory of the architectural and urban past image of Turin.  ...  It was required to the producers of these models to adopt a language as simple as possible (De Francesco, D'Andrea, 2008) , with the aim to share information and diffuse them in a readily understandable  ... 
doi:10.5194/isprsannals-ii-5-w3-25-2015 fatcat:jzrs5dpnbbfzbdtjxqvq2wqwvy

Text-based editing of talking-head video

Ohad Fried, Maneesh Agrawala, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt
2019 ACM Transactions on Graphics  
large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis.  ...  Deep Generative Models. Very recently, researchers have proposed Deep Generative Adversarial Networks (GANs) for the synthesis of images and videos.  ...  We use blended head parameters from the corresponding video frames, together with a retimed background sequence, to generate a composite image, which is used to generate a photorealistic frame using our  ... 
doi:10.1145/3306346.3323028 fatcat:4om6pveolfarrje4t4a2be4c2a

Deepwater Archaeological Survey: An Interdisciplinary and Complex Process [chapter]

Pierre Drap, Odile Papini, Djamal Merad, Jérôme Pasquet, Jean-Philip Royer, Mohamad Motasem Nawaf, Mauro Saccone, Mohamed Ben Ellefi, Bertrand Chemisky, Julien Seinturier, Jean-Christophe Sourisseau, Timmy Gambin (+1 others)
2019 Management-Reihe Corporate Social Responsibility  
Here, we introduce the concepts, the developing process, and some results, which we obtained with underwater imaging.  ...  In the same direction, we present the Non-Photorealistic Rendering (NPR) technique, which converts a 3D model into a more readable 2D representation that is more useful to communicate and simplifies the  ...  We would like to thank the artist Cosette Nigon for providing sketches in the deep learning, style transfer approach (Fig. 9.4) .  ... 
doi:10.1007/978-3-030-03635-5_9 fatcat:376nb5jzuvdprmqkxzeb32z6zq

GAN Inversion: A Survey [article]

Weihao Xia, Yulun Zhang, Yujiu Yang, Jing-Hao Xue, Bolei Zhou, Ming-Hsuan Yang
2022 arXiv   pre-print
GAN inversion aims to invert a given image back into the latent space of a pretrained GAN model, for the image to be faithfully reconstructed from the inverted code by the generator.  ...  As an emerging technique to bridge the real and fake image domains, GAN inversion plays an essential role in enabling the pretrained GAN models such as StyleGAN and BigGAN to be used for real image editing  ...  PRELIMINARIES GAN Models and Datasets Deep generative models such as GANs [1] have been used to model natural image distributions and synthesize photorealistic images.  ... 
arXiv:2101.05278v5 fatcat:ff3evb2nv5ezzaxju2cucbizde

Pretraining is All You Need for Image-to-Image Translation [article]

Tengfei Wang, Ting Zhang, Bo Zhang, Hao Ouyang, Dong Chen, Qifeng Chen, Fang Wen
2022 arXiv   pre-print
We also propose adversarial training to enhance the texture synthesis in the diffusion model training, in conjunction with normalized guidance sampling to improve the generation quality.  ...  In this paper, we regard each image-to-image translation problem as a downstream task and introduce a simple and generic framework that adapts a pretrained diffusion model to accommodate various kinds  ...  Such an image-to-image translation problem [23] essentially relates to learning the conditional distribution of natural images given the input using deep generative models.  ... 
arXiv:2205.12952v1 fatcat:e5ovdnsj55h6fclqq2yyqfolny

Analyzing Pathfinder data using virtual reality and superresolved imaging

Carol R. Stoker, Eric Zbinden, Theodore T. Blackmon, Bob Kanefsky, Joel Hagen, Charles Neveu, Daryl Rasmussen, Kurt Schwehr, Michael Sims
1999 Journal of Geophysical Research  
From early digital image processing techniques developed during the Viking era [Levinthai et al., 1977; Green, 1977 ] to our application of photorealistic VR models to the Pathfinder landing site, advances  ...  Photorealistic 3-D models of the terrain surrounding the Pathfinder lander were produced using stereo images from the Imager for Mars Pathfinder (IMP) camera.  ...  Smith and his staff for providing early access to the IMP engineering model, which allowed us to test 3-D modeling and superresolution. We are grateful for the assistance of J. Moore, G. Hovde, S.  ... 
doi:10.1029/1998je900019 fatcat:6uqy3mmhujchbdwialtg4uiire

Opal: Multimodal Image Generation for News Illustration [article]

Vivian Liu, Han Qiao, Lydia Chilton
2022 arXiv   pre-print
Multimodal AI advancements have presented people with powerful ways to create images from text.  ...  In this paper, we address this challenge with Opal, a system that produces text-to-image generations for editorial illustration.  ...  to VQGAN. [2, 12, 13, 17] Newer methods such as diffusion models have also increased output quality. [11, 15, 29, 30] FORMATIVE STUDY In order how a text-to-image model could best augment a news illustrator's  ... 
arXiv:2204.09007v2 fatcat:owtkmwq4rzfc5l7i7mxmhk5zeu
« Previous Showing results 1 — 15 out of 220 results