Filters








503 Hits in 3.0 sec

Model-based occlusion disentanglement for image-to-image translation [article]

Fabio Pizzati, Pietro Cerri, Raoul de Charette
2020 arXiv   pre-print
Our unsupervised model-based learning disentangles scene and occlusions, while benefiting from an adversarial pipeline to regress physical parameters of the occlusion model.  ...  The experiments demonstrate our method is able to handle varying types of occlusions and generate highly realistic translations, qualitatively and quantitatively outperforming the state-of-the-art on multiple  ...  Model-based disentanglement We aim to learn the disentangled representation of a target domain and occlusions.  ... 
arXiv:2004.01071v2 fatcat:d7wsgisdljf5jeqdufneekhs2e

Physics-informed Guided Disentanglement in Generative Networks [article]

Fabio Pizzati, Pietro Cerri, Raoul de Charette
2022 arXiv   pre-print
In this paper, we build upon collection of simple physics models and present a comprehensive method for disentangling visual traits in target images, guiding the process with a physical model that renders  ...  Image-to-image translation (i2i) networks suffer from entanglement effects in presence of physics-related phenomena in target domain (such as occlusions, fog, etc), lowering altogether the translation  ...  DirtyGAN [96] is a GAN-based framework for opaque soiling occlusion generation.  ... 
arXiv:2107.14229v3 fatcat:sgoocc6qr5h2ldjlpxgpj7oaf4

DeepRM: Deep Recurrent Matching for 6D Pose Refinement [article]

Alexander Avery, Andreas Savakis
2022 arXiv   pre-print
In contrast to many 2-stage Perspective-n-Point based solutions, DeepRM is trained end-to-end, and uses a scalable backbone that can be tuned via a single parameter for accuracy and efficiency.  ...  Optical flow prediction stabilizes the training process, and enforces the learning of features that are relevant to the task of pose estimation.  ...  For both YCB-Video and Occlusion LINEMOD datasets, we use the ADAM optimizer [33] , with a base learning rate of 1e-4.  ... 
arXiv:2205.14474v3 fatcat:5ngkjid34bf6roeakn4ja6nazu

Pose-guided Feature Disentangling for Occluded Person Re-identification Based on Transformer [article]

Tao Wang, Hong Liu, Pinhao Song, Tianyu Guo, Wei Shi
2021 arXiv   pre-print
Fourth, to better prevent the interference of occlusions, we design a Pose-guided Push Loss to emphasize the features of visible body parts.  ...  Therefore, we propose a transformer-based Pose-guided Feature Disentangling (PFD) method by utilizing pose information to clearly disentangle semantic components (e.g. human body or joint parts) and selectively  ...  Transreid: Transformer-based object re-identification. ference of occlusion noises by pushing the distance between In Proceedings of the IEEE/CVF International Conference  ... 
arXiv:2112.02466v2 fatcat:37hftouzwrfivpugzvqvfqrwkm

Latents2Segments: Disentangling the Latent Space of Generative Models for Semantic Segmentation of Face Images [article]

Snehal Singh Tomar, A.N. Rajagopalan
2022 arXiv   pre-print
infusion of disentanglement with respect to facial semantic regions of interest (ROIs) in the latent space of a Generative Autoencoder model.  ...  The encoded latent space of our model achieves significantly higher disentanglement with respect to semantic ROIs than that of other SOTA works.  ...  Acknowledgement: Support from Institute of Eminence (IoE) project No. SB20210832EEMHRD005001 is gratefully acknowledged.  ... 
arXiv:2207.01871v2 fatcat:dezbnobiybbsfp4zz7fjyo7umi

Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation [article]

Jogendra Nath Kundu, Siddharth Seth, Rahul M V, Mugalodi Rakesh, R. Venkatesh Babu, Anirban Chakraborty
2020 arXiv   pre-print
Though weakly-supervised models have been proposed to address this shortcoming, performance of such models relies on availability of paired supervision on some related tasks, such as 2D pose or multi-view  ...  Furthermore, devoid of unstable adversarial setup, we re-utilize the decoder to formalize an energy-based loss, which enables us to learn from in-the-wild videos, beyond laboratory settings.  ...  This work was supported by a Wipro PhD Fellowship (Jogendra) and in part by DST, Govt. of India (DST/INT/UK/P-179/2017).  ... 
arXiv:2006.14107v1 fatcat:r3c3m6ugx5bxhfrepsake366t4

Deep Optics for Monocular Depth Estimation and 3D Object Detection [article]

Julie Chang, Gordon Wetzstein
2019 arXiv   pre-print
We find an optimized freeform lens design yields the best results, but chromatic aberration from a singlet lens offers significantly improved performance as well.  ...  Here we introduce the paradigm of deep optics, i.e. end-to-end design of optics and image processing, to the monocular depth estimation problem, using coded defocus blur as an additional depth cue to be  ...  There may be simulation inaccuracies that are not straightforward to disentangle unless the entire dataset was recaptured through the different lenses.  ... 
arXiv:1904.08601v1 fatcat:ok5u3wmykrhlnm55ypnzkcfope

Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation

Jogendra Nath Kundu, Siddharth Seth, Rahul M V, Mugalodi Rakesh, Venkatesh Babu Radhakrishnan, Anirban Chakraborty
2020 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
Though weakly-supervised models have been proposed to address this shortcoming, performance of such models relies on availability of paired supervision on some related task, such as 2D pose or multi-view  ...  Furthermore, devoid of unstable adversarial setup, we re-utilize the decoder to formalize an energy-based loss, which enables us to learn from in-the-wild videos, beyond laboratory settings.  ...  This work was supported by a Wipro PhD Fellowship (Jogendra) and in part by DST, Govt. of India (DST/INT/UK/P-179/2017).  ... 
doi:10.1609/aaai.v34i07.6792 fatcat:f5vmvnzf5zcklewmlr5kjeryvi

Learning Generative Models of Textured 3D Meshes from Real-World Images [article]

Dario Pavllo, Jonas Kohler, Thomas Hofmann, Aurelien Lucchi
2021 arXiv   pre-print
These models natively disentangle pose and appearance, enable downstream applications in computer graphics, and improve the ability of generative models to understand the concept of image formation.  ...  Recent advances in differentiable rendering have sparked an interest in learning generative models of textured 3D meshes from image collections.  ...  In addition, we provide a preliminary analysis of the disentanglement properties learned by these models. • We propose a comprehensive 3D pose estimation framework that combines the merits of template-based  ... 
arXiv:2103.15627v2 fatcat:urptzdk77fevnghh7hpbpj6bzm

CausalX: Causal Explanations and Block Multilinear Factor Analysis [article]

M. Alex O. Vasilescu, Eric Kim, Xiao S. Zeng
2021 arXiv   pre-print
Learning a part-based intrinsic causal factor representations in a multilinear framework requires applying a set of interventions on a part-based multilinear model.  ...  We propose a unified multilinear model of wholes and parts.  ...  ACKNOWLEDGEMENT The authors are thankful to Ernest Davis for feedback provided during the writing of this document, and to Donald Rubin and Andrew Gelman for helpful discussions.  ... 
arXiv:2102.12853v2 fatcat:3dizqcutdjfq5dubsodqja2h7e

HVH: Learning a Hybrid Neural Volumetric Representation for Dynamic Hair Performance Capture [article]

Ziyan Wang, Giljoo Nam, Tuur Stuyck, Stephen Lombardi, Michael Zollhoefer, Jessica Hodgins, Christoph Lassner
2021 arXiv   pre-print
our model, we further optimize the 3D scene flow of our representation with multi-view optical flow, using volumetric ray marching.  ...  In this paper, we address the aforementioned problems: 1) we use a novel, volumetric hair representation that is com-posed of thousands of primitives.  ...  Image-based Hair Geometry Acquisition is challenging due to the complicated hair geometry, massive number of strands, severe self occlusion and collision and viewdependent appearance. Paris et al.  ... 
arXiv:2112.06904v3 fatcat:ifuo7ngmvngwpgftkx42gmmz4m

PMD-Net: Privileged Modality Distillation Network for 3D Hand Pose Estimation from a Single RGB Image

Kewen Wang, Xilin Chen
2020 British Machine Vision Conference  
3D Hand Pose Estimation from a single RGB image is a challenging task due to the significant depth ambiguities and occlusions.  ...  In this paper, we propose a Privileged Modality Distillation Network (PMD-Net), which improves the RGB-based hand pose estimation by excavating the privileged information from depth prior during training  ...  In addition, Yang and Yao [32] propose a disentangled VAE to learn disentangled representations of hand poses and hand images.  ... 
dblp:conf/bmvc/WangC20 fatcat:ocwmvsoauzb3ddbbztkj57a7uq

Compositional_Hierarchical_Tensor_Factorization_KDD[1].pdf [article]

Alex Vasilescu, Eric Kim
2020 figshare.com  
, this paper proposes a unified tensor model of wholes and parts, and introduces a compositional hierarchical tensor factorization that disentangles the hierarchical causal structure of object image formation  ...  The resulting object representation is an interpretable combinatorial choice of wholes' and parts' representations that renders object recognition robust to occlusion and reduces training data requirements  ...  (b) These set of images demonstrate the models ability to disentangle the causal factors.  ... 
doi:10.6084/m9.figshare.12981263.v1 fatcat:gcuuzfctnve4dp2ugt4kzrlje4

A Hardware Realization of Superresolution Combining Random Coding and Blurring [article]

Kevin Beale, Jianbo Chen (Rice University), Kevin F. Kelly, Justin Romberg (Georgia Institute of Technology)
2018 arXiv   pre-print
These methods often require movement of some portion of the imaging apparatus or only acquire images up to the resolution of a modulating element.  ...  Here a technique is presented for resolving beyond the resolutions of both a pointwise-modulating mask element and a sensor array through the introduction of a controlled blur into the optical pathway.  ...  Calibration Ideally the blur kernel can be estimated based on the distance between the second lens and the FPA.  ... 
arXiv:1810.08855v1 fatcat:tyuep2guwvaohdc2a2fbtaww54

Compositional Hierarchical Tensor Factorization: Representing Hierarchical Intrinsic and Extrinsic Causal Factors [article]

M. Alex O. Vasilescu, Eric Kim
2020 arXiv   pre-print
Therefore, this paper proposes a unified tensor model of wholes and parts, and introduces a compositional hierarchical tensor factorization that disentangles the hierarchical causal structure of object  ...  The resulting object representation is an interpretable combinatorial choice of wholes' and parts' representations that renders object recognition robust to occlusion and reduces training data requirements  ...  (b) These set of images demonstrate the models ability to disentangle the causal factors.  ... 
arXiv:1911.04180v2 fatcat:gj4nx5ugbjd7nf7cpdcbqnwbee
« Previous Showing results 1 — 15 out of 503 results