Filters








1,000 Hits in 13.1 sec

Unsupervised Disentanglement of Pose, Appearance and Background from Images and Videos [article]

Aysegul Dundar, Kevin J. Shih, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro
2020 arXiv   pre-print
A popular approach is to factorize an image into a pose and appearance data stream, then to reconstruct the image from the factorized components.  ...  Furthermore, the rendered background quality is also improved, as the background rendering pipeline no longer requires the ill-suited landmarks to model its pose and appearance.  ...  The model must learn to extract the pose information from the appearance-perturbed image, and appearance information from the pose-perturbed image.  ... 
arXiv:2001.09518v1 fatcat:iz6pjfea4fbwdlp5k7gtwp223e

Unsupervised Part-Based Disentangling of Object Shape and Appearance [article]

Dominik Lorenz, Leonard Bereska, Timo Milbich, Björn Ommer
2019 arXiv   pre-print
We evaluate our approach on a wide range of object categories and diverse tasks including pose prediction, disentangled image synthesis, and video-to-video translation.  ...  We present an unsupervised approach for disentangling appearance and shape by learning parts consistently over all instances of a category.  ...  / Figure 1: Our unsupervised learning of a disentangled partbased shape and appearance enables numerous tasks ranging from unsupervised pose estimation to image synthesis and retargeting.  ... 
arXiv:1903.06946v3 fatcat:bppuidyf7ngajlphwc4aikhnjy

Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation [article]

Jogendra Nath Kundu, Siddharth Seth, Rahul M V, Mugalodi Rakesh, R. Venkatesh Babu, Anirban Chakraborty
2020 arXiv   pre-print
Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to several human-centric applications.  ...  Comprehensive experiments demonstrate our state-of-the-art unsupervised and weakly-supervised pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets.  ...  This work was supported by a Wipro PhD Fellowship (Jogendra) and in part by DST, Govt. of India (DST/INT/UK/P-179/2017).  ... 
arXiv:2006.14107v1 fatcat:r3c3m6ugx5bxhfrepsake366t4

Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation

Jogendra Nath Kundu, Siddharth Seth, Rahul M V, Mugalodi Rakesh, Venkatesh Babu Radhakrishnan, Anirban Chakraborty
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Estimation of 3D human pose from monocular image has gained considerable attention, as a key step to several human-centric applications.  ...  Comprehensive experiments demonstrate our state-of-the-art unsupervised and weakly-supervised pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets.  ...  This work was supported by a Wipro PhD Fellowship (Jogendra) and in part by DST, Govt. of India (DST/INT/UK/P-179/2017).  ... 
doi:10.1609/aaai.v34i07.6792 fatcat:f5vmvnzf5zcklewmlr5kjeryvi

Unsupervised Temporal Learning on Monocular Videos for 3D Human Pose Estimation [article]

Sina Honari, Victor Constantin, Helge Rhodin, Mathieu Salzmann, Pascal Fua
2022 arXiv   pre-print
In this paper we propose an unsupervised learning method to extract temporal information on monocular videos, where we detect and encode subject of interest in each frame and leverage contrastive self-supervised  ...  Instead of simply treating the latent features of nearby frames as positive pairs and those of temporally-distant ones as negative pairs as in other CSS approaches, we explicitly disentangle each latent  ...  We also would like to thank SwissTiming for their support and cooperation in this project.  ... 
arXiv:2012.01511v3 fatcat:dlgqymmyvfcz3brqaypi4bihme

Unsupervised Learning of Object Landmarks through Conditional Image Generation [article]

Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi
2018 arXiv   pre-print
We demonstrate that our approach can learn object landmarks from synthetic image deformations or videos, all without manual supervision, while outperforming state-of-the-art unsupervised landmark detectors  ...  We cast this as the problem of generating images that combine the appearance of the object as seen in a first example image with the geometry of the object as seen in a second example image, where the  ...  We would like to thank James Thewlis for suggestions and support with code and data, and David Novotný and Triantafyllos Afouras for helpful advice.  ... 
arXiv:1806.07823v2 fatcat:wiypxze42vbbfm6pib6rgtcqwq

Motion Representations for Articulated Animation [article]

Aliaksandr Siarohin, Oliver J. Woodford, Jian Ren, Menglei Chai, Sergey Tulyakov
2021 arXiv   pre-print
To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space.  ...  To force decoupling of foreground from background, we model non-object related global motion with an additional affine transformation.  ...  Finally, to prevent shape transfer and improve animation, we disentangle the shape and pose of objects in the space of unsupervised regions.  ... 
arXiv:2104.11280v1 fatcat:ou7hevqtsveslhltjb4qt7mnom

Self-Supervised 3D Human Pose Estimation via Part Guided Novel Image Synthesis [article]

Jogendra Nath Kundu, Siddharth Seth, Varun Jampani, Mugalodi Rakesh, R. Venkatesh Babu, Anirban Chakraborty
2020 arXiv   pre-print
Acknowledging this, we propose a self-supervised learning framework to disentangle such variations from unlabeled video frames.  ...  Performance of supervised 3D pose estimation approaches comes at the cost of dispensing with variations, such as shape and appearance, that may be useful for solving other related tasks.  ...  Approach We develop a differentiable framework for selfsupervised disentanglement of 3D pose and foreground appearance from in-the-wild video frames of human activity.  ... 
arXiv:2004.04400v1 fatcat:gxjrtfgkpff2peilkm3lhvx2oq

Disentangling Motion, Foreground and Background Features in Videos [article]

Xunyu Lin, Victor Campos, Xavier Giro-i-Nieto, Jordi Torres and Cristian Canton Ferrer
2017 arXiv   pre-print
Qualitative results indicate that the network can successfully segment foreground and background in videos as well as update the foreground appearance based on disentangled motion features.  ...  A preliminary supervised experiment was conducted to verify the feasibility of proposed method by training the model with a fraction of videos from the UCF-101 dataset taking as ground truth the bounding  ...  The contribution from the Barcelona Supercomputing Center has been supported by project TIN2015-65316 by the Spanish Ministry of Science and Innovation contracts 2014-SGR-1051 by Generalitat de Catalunya  ... 
arXiv:1707.04092v2 fatcat:rfzk4sazbzaghaino5nfwxbc2m

LatentKeypointGAN: Controlling GANs via Latent Keypoints [article]

Xingzhe He, Bastian Wandt, Helge Rhodin
2021 arXiv   pre-print
In addition, the explicit generation of keypoints and matching images enables a new, GAN-based method for unsupervised keypoint detection.  ...  portraits by combining the eyes, nose, and mouth from different images.  ...  ACKNOWLEDGEMENT This work was supported by the UBC Advanced Research Computing (ARC) GPU cluster, the Compute Canada GPU servers, and a Huawei-UBC Joint Lab project.  ... 
arXiv:2103.15812v3 fatcat:swpkydxknjccxg7fu35f4kljuy

Unsupervised Part Discovery by Unsupervised Disentanglement [article]

Sandro Braun, Patrick Esser, Björn Ommer
2020 arXiv   pre-print
Our approach leverages a generative model consisting of two disentangled representations for an object's shape and appearance and a latent variable for the part segmentation.  ...  From a single image, the trained model infers a semantic part segmentation map.  ...  DeepFashion contains strong variations in viewpoints, poses and appearances but only a simple background. Exercise has strong pose variation but only simple appearances and a simple background.  ... 
arXiv:2009.04264v2 fatcat:o7lnhjovhfe5pe5qwxpevf3swu

MixNMatch: Multifactor Disentanglement and Encoding for Conditional Image Generation [article]

Yuheng Li, Krishna Kumar Singh, Utkarsh Ojha, Yong Jae Lee
2020 arXiv   pre-print
We present MixNMatch, a conditional generative model that learns to disentangle and encode background, object pose, shape, and texture from real images with minimal supervision, for mix-and-match image  ...  We build upon FineGAN, an unconditional generative model, to learn the desired disentanglement and image generator, and leverage adversarial joint image-code distribution matching to learn the latent factor  ...  This work was supported in part by NSF CAREER IIS-1751206, IIS-1748387, IIS-1812850, AWS ML Research Award, Adobe Data Science Research Award, and Google Cloud Platform research credits.  ... 
arXiv:1911.11758v3 fatcat:wdsmyvx3zzgd3pp6gucosm4ttq

Video Content Swapping Using GAN [article]

Tingfung Lau, Sailun Xu, Xinze Wang
2021 arXiv   pre-print
We first extract the pose information from a video using a pre-trained human pose detection and use a generative model to synthesize the video based on the content code and pose code.  ...  These deep generative models provide away to utilize all the unlabeled images and videos online, since it can learn deep feature representations with unsupervised manner.  ...  Secondly, the system should recognize between foreground and background images of the video.  ... 
arXiv:2111.10916v1 fatcat:56navsdz7jan3g244ceg275wxa

HoloGAN: Unsupervised learning of 3D representations from natural images [article]

Thu Nguyen-Phuoc, Chuan Li, Lucas Theis, Christian Richardt, Yong-Liang Yang
2019 arXiv   pre-print
Our experiments show that using explicit 3D features enables HoloGAN to disentangle 3D pose and identity, which is further decomposed into shape and appearance, while still being able to generate images  ...  We propose a novel generative adversarial network (GAN) for the task of unsupervised learning of 3D representations from natural images.  ...  Acknowledgements We received support from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No. 665992, the EPSRC Centre for Doctoral  ... 
arXiv:1904.01326v2 fatcat:qqnuq43y45ce3lqkliuua3v4mm

Neural Head Reenactment with Latent Pose Descriptors [article]

Egor Burkov, Igor Pasechnik, Artur Grigorev, Victor Lempitsky
2020 arXiv   pre-print
The latent pose representation is learned as a part of the entire reenactment system, and the learning process is based solely on image reconstruction losses.  ...  We propose a neural head reenactment system, which is driven by a latent pose representation and is capable of predicting the foreground segmentation alongside the RGB image.  ...  We sampled 1 of every 25 frames from each video, leaving around seven million of total training images.  ... 
arXiv:2004.12000v1 fatcat:y3s3xkidvvhelc4gla3r5oos4a
« Previous Showing results 1 — 15 out of 1,000 results