Filters








1,056 Hits in 3.7 sec

Video Interpolation and Prediction with Unsupervised Landmarks [article]

Kevin J. Shih, Aysegul Dundar, Animesh Garg, Robert Pottorf, Andrew Tao, Bryan Catanzaro
2019 arXiv   pre-print
This work poses video prediction and interpolation as unsupervised latent structure inference followed by a temporal prediction in this latent space.  ...  Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due  ...  Acknowledgements: We would like to thank Alex Lee and Emily Denton for releasing the source code and pre-trained weights for their respective models.  ... 
arXiv:1909.02749v1 fatcat:2xwaxtjc7ndvnhka2ry4qigjue

Unsupervised Discovery of Object Landmarks as Structural Representations [article]

Yuting Zhang, Yijie Guo, Yixin Jin, Yijun Luo, Zhiyuan He, Honglak Lee
2018 arXiv   pre-print
Our discovered landmarks are semantically meaningful and more predictive of manually annotated landmarks than those discovered by previous methods.  ...  In addition, the proposed method naturally creates an unsupervised, perceptible interface to manipulate object shapes and decode images with controllable structures.  ...  Acknowledgements This work was supported in part by ONR N00014-13-1-0762, NSF CAREER IIS-1453651, and Sloan Research Fellowship.  ... 
arXiv:1804.04412v1 fatcat:cabwrmgygfb2ti6hm3cw6wepam

Unsupervised Learning of Object Landmarks through Conditional Image Generation [article]

Tomas Jakab, Ankush Gupta, Hakan Bilen, Andrea Vedaldi
2018 arXiv   pre-print
We demonstrate that our approach can learn object landmarks from synthetic image deformations or videos, all without manual supervision, while outperforming state-of-the-art unsupervised landmark detectors  ...  We propose a method for learning landmark detectors for visual objects (such as the eyes and the nose in a face) without any manual supervision.  ...  We would like to thank James Thewlis for suggestions and support with code and data, and David Novotný and Triantafyllos Afouras for helpful advice.  ... 
arXiv:1806.07823v2 fatcat:wiypxze42vbbfm6pib6rgtcqwq

Deforming Autoencoders: Unsupervised Disentangling of Shape and Appearance [article]

Zhixin Shu, Mihir Sahasrabudhe, Alp Guler, Dimitris Samaras, Nikos Paragios, Iasonas Kokkinos
2018 arXiv   pre-print
We show experiments with expression morphing in humans, hands, and digits, face manipulation, such as shape and appearance interpolation, as well as unsupervised landmark localization.  ...  A more powerful form of unsupervised disentangling becomes possible in template coordinates, allowing us to successfully decompose face images into shading and albedo, and further manipulate face images  ...  Our experiments with expression morphing in humans, image manipulation, such as shape and appearance interpolation, as well as unsupervised landmark localization, show the generality of our approach.  ... 
arXiv:1806.06503v1 fatcat:2y3w7ofn6fhzrkac27gsabrg74

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors [article]

Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, Yaser Sheikh
2018 arXiv   pre-print
In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video.  ...  With supervision-by-registration, we demonstrate (1) improvements in facial landmark detection on both images (300W, ALFW) and video (300VW, Youtube-Celebrities), and (2) significant reduction of jittering  ...  We filter videos with low resolution 2 , and use the remaining videos to train SBR in an unsupervised way. 300-VW [6, 31, 35] . This video dataset contains 50 training videos with 95192 frames.  ... 
arXiv:1807.00966v2 fatcat:tqp6x5qxxjcbxaacpzozn5455u

Supervision-by-Registration: An Unsupervised Approach to Improve the Precision of Facial Landmark Detectors

Xuanyi Dong, Shoou-I Yu, Xinshuo Weng, Shih-En Wei, Yi Yang, Yaser Sheikh
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
In this paper, we present supervision-by-registration, an unsupervised approach to improve the precision of facial landmark detectors on both images and video.  ...  With supervision-by-registration, we demonstrate (1) improvements in facial landmark detection on both images (300W, ALFW) and video (300VW, Youtube-Celebrities), and (2) significant reduction of jittering  ...  We filter videos with low resolution 2 , and use the remaining videos to train SBR in an unsupervised way. 300-VW [6, 31, 35] . This video dataset contains 50 training videos with 95192 frames.  ... 
doi:10.1109/cvpr.2018.00045 dblp:conf/cvpr/DongYWW0S18 fatcat:wdtpkpvojva2nm4pqlcisqwxmu

Neural Head Reenactment with Latent Pose Descriptors [article]

Egor Burkov, Igor Pasechnik, Artur Grigorev, Victor Lempitsky
2020 arXiv   pre-print
We show that despite its simplicity, with a large and diverse enough training dataset, such learning successfully decomposes pose from identity.  ...  Additionally, we show that the learned descriptors are useful for other pose-related tasks, such as keypoint prediction and pose-based retrieval.  ...  We employ an off-the-shelf 2D facial landmarks prediction algorithm [2] L to obtain landmarks in both the driver I k j and the reenactment result T k (I k j ).  ... 
arXiv:2004.12000v1 fatcat:y3s3xkidvvhelc4gla3r5oos4a

Unsupervised Learning of Monocular Depth Estimation with Bundle Adjustment, Super-Resolution and Clip Loss [article]

Lipu Zhou, Jiamin Ye, Montiel Abello, Shengze Wang, Michael Kaess
2018 arXiv   pre-print
We present a novel unsupervised learning framework for single view depth estimation using monocular videos.  ...  Additionally, we introduce the clip loss to deal with moving objects and occlusion.  ...  This paper focuses on unsupervised learning using monocular videos and seeks to reduce this gap.  ... 
arXiv:1812.03368v1 fatcat:462ptqvitjcmlmhig6im2hmmke

Audio-Driven Emotional Video Portraits [article]

Xinya Ji, Hang Zhou, Kaisiyuan Wang, Wayne Wu, Chen Change Loy, Xun Cao, Feng Xu
2021 arXiv   pre-print
., a duration-independent emotion space and a duration dependent content space. With the disentangled features, dynamic 2D emotional facial landmarks can be deduced.  ...  Then we propose the Target-Adaptive Face Synthesis technique to generate the final high-quality video portraits, by bridging the gap between the deduced landmarks and the natural head poses of target videos  ...  the Beijing Natural Science Foundation (JQ19015), the NSFC (No.61822111, 61727808, 61627804), the NSFJS (BK20192003), partly by Leading Technology of Jiangsu Basic Research Plan under Grant BK2019200, and  ... 
arXiv:2104.07452v2 fatcat:mp6zh2bxtnc7hilkldxzzokaxi

Deformable Generator Network: Unsupervised Disentanglement of Appearance and Geometry [article]

Xianglei Xing, Ruiqi Gao, Tian Han, Song-Chun Zhu, Ying Nian Wu
2020 arXiv   pre-print
We present a deformable generator model to disentangle the appearance and geometric information for both image and video data in a purely unsupervised manner.  ...  Two generators take independent latent vectors as input to disentangle the appearance and geometric information from image or video sequences.  ...  landmark prediction on the MAFL test set.  ... 
arXiv:1806.06298v3 fatcat:cwx4l5crqjhsfagzctvoylvivq

Supervision by Registration and Triangulation for Landmark Detection

Xuanyi Dong, Yi Yang, Shih-En Wei, Xinshuo Weng, Yaser Sheikh, Shoou-I Yu
2020 IEEE Transactions on Pattern Analysis and Machine Intelligence  
We present Supervision by Registration and Triangulation (SRT), an unsupervised approach that utilizes unlabeled multi-view video to improve the accuracy and precision of landmark detectors.  ...  Experiments with 11 datasets and a newly proposed metric to measure precision demonstrate accuracy and precision improvements in landmark detection on both images and video.  ...  and precision in both images and videos, more stable predictions in videos, and more consistent predictions in different views.  ... 
doi:10.1109/tpami.2020.2983935 pmid:32248096 fatcat:qo7zjzlxarf47iyvobzyvjwtqm

"Look Ma, No Landmarks!" – Unsupervised, Model-Based Dense Face Alignment [chapter]

Tatsuro Koizumi, William A. P. Smith
2020 Lecture Notes in Computer Science  
In this paper, we show how to train an image-to-image network to predict dense correspondence between a face image and a 3D morphable model using only the model for supervision.  ...  The least squares residuals provide an unsupervised training signal that allows us to avoid artefacts common in the literature such as shrinking and conservative underfitting.  ...  [39] and compare our result with supervised facial landmark detection methods. We evaluate landmarks obtained from both direct correspondence and fitted model.  ... 
doi:10.1007/978-3-030-58536-5_41 fatcat:wq3ymqnkpbd73i5ido3krtzoku

Teacher-Student Asynchronous Learning with Multi-Source Consistency for Facial Landmark Detection [article]

Rongye Meng, Sanping Zhou, Xingyu Wan, Mengliu Li, Jinjun Wang
2020 arXiv   pre-print
Due to the high annotation cost of large-scale facial landmark detection tasks in videos, a semi-supervised paradigm that uses self-training for mining high-quality pseudo-labels to participate in training  ...  And extensive experiments on 300W, AFLW, and 300VW benchmarks show that the TSAL framework achieves state-of-the-art performance.  ...  Video The video dataset used was the 300VW dataset (Shen et al. 2015 ) that contains 50 training videos with 95192 frames.  ... 
arXiv:2012.06711v1 fatcat:xe4elcr5nfh5ninytwo5fau2yi

Physics Driven Domain Specific Transporter Framework with Attention Mechanism for Ultrasound Imaging [article]

Arpan Tripathi, Abhilash Rakkunedeth, Mahesh Raveendranatha Panicker, Jack Zhang, Naveenjyote Boora, Jessica Knight, Jacob Jaremko, Yale Tung Chen, Kiran Vishnu Narayan, Kesavadas C
2021 arXiv   pre-print
The proposed framework has been trained on130 Lung ultrasound (LUS) videos and 113 Wrist ultrasound (WUS) videos and validated on 100 Lung ultrasound (LUS) videos and 58 Wrist ultrasound (WUS) videos acquired  ...  In this paper, we propose an unsupervised, physics driven domain specific transporter framework with an attention mechanism to identify relevant key points with applications in ultrasound imaging.  ...  Semantic segmentation in natural images and videos has been approached using unsupervised techniques [20] .  ... 
arXiv:2109.06346v1 fatcat:n2w2ykhkbfhvfmv3gifqqttq2q

The Sparse Manifold Transform [article]

Yubei Chen, Dylan M. Paiton, Bruno A. Olshausen
2018 arXiv   pre-print
We provide a theoretical description of the transform and demonstrate properties of the learned representation on both synthetic data and natural videos.  ...  The sparse manifold transform is an unsupervised and generative framework that explicitly and simultaneously models the sparse discreteness and low-dimensional manifold structure found in natural scenes  ...  At each time, t, we use a nearest neighbor (KNN) solver to find a local linear interpolation of the point’s location from the landmarks, that is xt = ΦLM αt , with αt ∈ IR300 and αt  0 (the choice of  ... 
arXiv:1806.08887v2 fatcat:zmurlcmff5f7pokdnpochlhlaq
« Previous Showing results 1 — 15 out of 1,056 results