Filters








4,937 Hits in 4.9 sec

Deep Predictive Neural Network: Unsupervised Learning for Hand Pose Estimation

Jamal Banzi, Department of Electronic Engineering and Information Science, School of Information Science and Technology, University of Science and Technology of China, 230026, Hefei city, Anhui Province, P.R China, Isack Bulugu, Zhongfu Ye
2019 International Journal of Machine Learning and Computing  
The discriminative approaches for hand pose estimation from depth images usually require dense annotated data to train a supervised network.  ...  Index Terms-Deep learning, hand pose estimation, joint regression, predictive neural networks. Manuscript  ...  poses and corresponding depth images for estimating 3D hand pose.  ... 
doi:10.18178/ijmlc.2019.9.4.822 fatcat:wiij7s7cmjegrivfs6tk2fy7da

End-to-end Global to Local CNN Learning for Hand Pose Recovery in Depth Data [article]

Meysam Madadi, Sergio Escalera, Xavier Baro, Jordi Gonzalez
2018 arXiv   pre-print
Despite recent advances in 3D pose estimation of human hands, especially thanks to the advent of CNNs and depth cameras, this task is still far from being solved.  ...  estimations, outperforming state-of-the-art results on NYU and SyntheticHand datasets.  ...  We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp GPU used for this research.  ... 
arXiv:1705.09606v2 fatcat:ntnfaa4pxfe2jnzjhts7rw7gpu

NeurAll: Towards a Unified Model for Visual Perception in Automated Driving [article]

Ganesh Sistu, Isabelle Leang, Sumanth Chennupati, Ciaran Hughes, Stefan Milz, Senthil Yogamani, Samir Rawashdeh
2019 arXiv   pre-print
Convolutional Neural Networks (CNNs) are successfully used for the important automotive visual perception tasks including object recognition, motion and depth estimation, visual SLAM, etc.  ...  Indeed, the main bottleneck in automated driving systems is the limited processing power available on deployment hardware.  ...  Depth regression decoder was constructed similar to FCN8 [33] decoder except the final layer was replaced with regression units instead of softmax to estimate depth.  ... 
arXiv:1902.03589v2 fatcat:ed6vnc2b7reqneiokvv4bd6hg4

Disentangling 3D Pose in A Dendritic CNN for Unconstrained 2D Face Alignment [article]

Amit Kumar, Rama Chellappa
2018 arXiv   pre-print
Following a Bayesian formulation, we disentangle the 3D pose of a face image explicitly by conditioning the landmark estimation on pose, making it different from multi-tasking approaches.  ...  Instead of increasing depth or width of the network, we train the CNN efficiently with Mask-Softmax Loss and hard sample mining to achieve upto 15% reduction in error compared to state-of-the-art methods  ...  On the other hand, 3D pose is fairly stable to them and can be estimated directly from 2D image [31] .  ... 
arXiv:1802.06713v3 fatcat:tlmpaleaf5etzbtna2hvsafovu

Disentangling 3D Pose in a Dendritic CNN for Unconstrained 2D Face Alignment

Amit Kumar, Rama Chellappa
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
Following a Bayesian formulation, we disentangle the 3D pose of a face image explicitly by conditioning the landmark estimation on pose, making it different from multi-tasking approaches.  ...  Instead of increasing depth or width of the network, we train the CNN efficiently with Mask-Softmax Loss and hard sample mining to achieve upto 15% reduction in error compared to state-of-the-art methods  ...  On the other hand, 3D pose is fairly stable to them and can be estimated directly from 2D image [30] .  ... 
doi:10.1109/cvpr.2018.00052 dblp:conf/cvpr/KumarC18 fatcat:qicy2r7emrclnhh2yl7zcbr5lq

3D Hand Pose Estimation using Simulation and Partial-Supervision with a Shared Latent Space [article]

Masoud Abdi, Ehsan Abbasnejad, Chee Peng Lim, Saeid Nahavandi
2018 arXiv   pre-print
Tremendous amounts of expensive annotated data are a vital ingredient for state-of-the-art 3d hand pose estimation.  ...  Accordingly, we form a shared latent space between three modalities: synthetic depth image, real depth image, and pose.  ...  [28] propose a transductive regression forest that uses unlabeled and synthetic data to estimate the 3d hand pose. Shrivastava et al.  ... 
arXiv:1807.05380v1 fatcat:vpbv43uloba5lfabb67f2akezi

Foreground-aware Dense Depth Estimation for 360 Images

Qi Feng, Hubert P. H. Shum, Ryo Shimamura, Shigeo Morishima
2020 Journal of WSCG  
We further propose a novel auxiliary deep neural network to estimate both the depth of the omnidirectional images and the mask of the foreground objects, where the two tasks facilitate each other.  ...  However, existing depth estimation approaches produce sub-optimal results on real-world omnidirectional images with dynamic foreground objects.  ...  We then use augmented data to train depth estimation networks with the auxiliary MaskNet, and verified that the local depth loss can successfully improve the consistency of estimated depth within areas  ... 
doi:10.24132/jwscg.2020.28.10 fatcat:myjvc7kabrgivjs2ljrjms7p7q

BiHand: Recovering Hand Mesh with Multi-stage Bisected Hourglass Networks [article]

Lixin Yang, Jiasen Li, Wenqiang Xu, Yiqun Diao, Cewu Lu
2020 arXiv   pre-print
For quantitative evaluation, we conduct experiments on two public benchmarks, namely the Rendered Hand Dataset (RHD) and the Stereo Hand Pose Tracking Benchmark (STB).  ...  3D hand estimation has been a long-standing research topic in computer vision. A recent trend aims not only to estimate the 3D hand joint locations but also to recover the mesh model.  ...  Related Work Our method closely relates to 3D hand pose estimation and 3D hand mesh reconstruction problems. 3D Hand Pose Estimation Early works on 3D hand pose estimation mainly focused on regressing  ... 
arXiv:2008.05079v1 fatcat:z4hc3gyfq5a7vb35xdwil47cfa

DeepRM: Deep Recurrent Matching for 6D Pose Refinement [article]

Alexander Avery, Andreas Savakis
2022 arXiv   pre-print
The rendered images are then matched with the observed images to predict a rigid transform for updating the previous pose estimate.  ...  To address this problem, we propose DeepRM, a novel recurrent network architecture for 6D pose refinement. DeepRM leverages initial coarse pose estimates to render synthetic images of target objects.  ...  The proposed DeepRM method improves upon DeepIM [2] with several innovations, such as high resolution cropping, disentangled loss, variable renderer brightness, a scalable backbone based on EfficientNet  ... 
arXiv:2205.14474v2 fatcat:3rxvypntobh7rdpzzcbkwar2pe

Modelling Uncertainty in Deep Learning for Camera Relocalization [article]

Alex Kendall, Roberto Cipolla
2016 arXiv   pre-print
We use a Bayesian convolutional neural network to regress the 6-DOF camera pose from a single RGB image.  ...  Using a Bayesian convolutional neural network implementation we obtain an estimate of the model's relocalization uncertainty and improve state of the art localization accuracy on a large scale outdoor  ...  At a shallower depth, with the first auxiliary pose regressor (green), the results are multi-modal. This is especially true for visually ambiguous images such as (c) in figure 2.  ... 
arXiv:1509.05909v2 fatcat:zarj7j42s5f2rehlkoikfjx6ve

Modelling uncertainty in deep learning for camera relocalization

Alex Kendall, Roberto Cipolla
2016 2016 IEEE International Conference on Robotics and Automation (ICRA)  
We use a Bayesian convolutional neural network to regress the 6-DOF camera pose from a single RGB image.  ...  Using a Bayesian convolutional neural network implementation we obtain an estimate of the model's relocalization uncertainty and improve state of the art localization accuracy on a large scale outdoor  ...  At a shallower depth, with the first auxiliary pose regressor (green), the results are multi-modal. This is especially true for visually ambiguous images such as (c) in Figure 2 .  ... 
doi:10.1109/icra.2016.7487679 dblp:conf/icra/KendallC16 fatcat:7zfus43jt5efnpebfhiohmi4aq

Novel View Synthesis for Large-scale Scene using Adversarial Loss [article]

Xiaochuan Yin, Henglai Wei, Penghong lin, Xiangwei Wang, Qijun Chen
2018 arXiv   pre-print
Most of previous works focus on generating novel views of certain objects with a fixed background.  ...  The inverse depth features are obtained from CNNs trained with sparse labeled depth values. This framework can easily fuse multiple images from different viewpoints.  ...  Adversarial loss with a real input image classifier and the real pose θt regression. Adversarial loss with a generated image classifier and the fake random pose variables zp regression.  ... 
arXiv:1802.07064v1 fatcat:ecqfcuk7pbhmtg5lqq3f6gx2oe

Survey on depth and RGB image-based 3D hand shape and pose estimation

Lin Huang, Boshen Zhang, Zhilin Guo, Yang Xiao, Zhiguo Cao, Junsong Yuan
2021 Virtual Reality & Intelligent Hardware  
hand shape and pose estimation.  ...  With the availability of large-scale annotated hand datasets and the rapid developments of deep neural networks (DNNs), numerous DNN-based data-driven methods have been proposed for accurate and rapid  ...  networks for 3D hand pose estimation.  ... 
doi:10.1016/j.vrih.2021.05.002 fatcat:4tbhftt3ira6fporaqlscqhsse

Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection [article]

Xianpeng Liu, Nan Xue, Tianfu Wu
2021 arXiv   pre-print
After training, the auxiliary context regression branches are discarded for better inference efficiency.  ...  The key idea is that with the annotated 3D bounding boxes of objects in an image, there is a rich set of well-posed projected 2D supervision signals available in training, such as the projected corner  ...  feature backbone and a list of regression heads with the same One the one hand, without the auxiliary components, our module architecture for the essential parameters and the aux- MonoCon is most  ... 
arXiv:2112.04628v1 fatcat:e5ev2xesvjgmpe5cfm57cscu6i

Aligning Latent Spaces for 3D Hand Pose Estimation

Linlin Yang, Shile Li, Dongheui Lee, Angela Yao
2019 2019 IEEE/CVF International Conference on Computer Vision (ICCV)  
Hand pose estimation from monocular RGB inputs is a highly challenging task.  ...  In this work, we propose to learn a joint latent representation that leverages other modalities as weak labels to improve RGB-based hand pose estimation.  ...  [11] use 3D voxels as input and regress the hand pose with a 3D CNN.  ... 
doi:10.1109/iccv.2019.00242 dblp:conf/iccv/YangLLY19 fatcat:tymbevymqraypfzhvii34awpsi
« Previous Showing results 1 — 15 out of 4,937 results