Filters








3 Hits in 2.6 sec

RGB2Hands: Real-Time Tracking of 3D Hand Interactions from Monocular RGB Video [article]

Jiayi Wang, Franziska Mueller, Florian Bernard, Suzanne Sorli, Oleksandr Sotnychenko, Neng Qian, Miguel A. Otaduy, Dan Casas, Christian Theobalt
2021 arXiv   pre-print
In contrast, in this work we present the first real-time method for motion capture of skeletal pose and 3D surface geometry of hands from a single RGB camera that explicitly considers close interactions  ...  Tracking and reconstructing the 3D pose and geometry of two hands in interaction is a challenging problem that has a high relevance for several human-computer interaction applications, including AR/VR,  ...  ACKNOWLEDGMENTS The work was supported by the ERC Consolidator Grants 4DRepLy (770784) and TouchDesign (772738) and Spanish Ministry of Science (RTI2018-098694-B-I00 VizLearning).  ... 
arXiv:2106.11725v1 fatcat:esmiakdncjco3cb6uhdzaqosoq

Interacting Attention Graph for Single Image Two-Hand Reconstruction [article]

Mengcheng Li, Liang An, Hongwen Zhang, Lianpeng Wu, Feng Chen, Tao Yu, Yebin Liu
2022 arXiv   pre-print
In this paper, we present Interacting Attention Graph Hand (IntagHand), the first graph convolution based network that reconstructs two interacting hands from a single RGB image.  ...  The second module is the cross hand attention (CHA) module that encodes the coherence of interacting hands by building dense cross-attention between two hand vertices.  ...  [40] contributes a monocular RGB based two-hand reconstruction by tracking dense matching map.  ... 
arXiv:2203.09364v2 fatcat:iizjj2vcirhw7miv3gkg5vbnhi

A Survey on RGB-D Datasets [article]

Alexandre Lopes, Roberto Souza, Helio Pedrini
2022
Hundreds of public RGB-D datasets containing various scenes, such as indoor, outdoor, aerial, driving, and medical, have been proposed.  ...  can be applied to investigate the development of generalizable machine learning models in the monocular depth estimation field.  ...  The robustness of the models are also evaluated in a cross-dataset strategy for estimating depth from a monocular video [228] , and instead of testing in multiple types of scenes, Ji et al.  ... 
doi:10.48550/arxiv.2201.05761 fatcat:str2yovblbco5psrq3v3dzqswa