H+O: Unified Egocentric Recognition of 3D Hand-Object Poses and Interactions

Bugra Tekin, Federica Bogo, Marc Pollefeys
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
We present a unified framework for understanding 3D hand and object interactions in raw image sequences from egocentric RGB cameras. Given a single RGB image, our model jointly estimates the 3D hand and object poses, models their interactions, and recognizes the object and action classes with a single feed-forward pass through a neural network. We propose a single architecture that does not rely on external detection algorithms but rather is trained end-to-end on single images. We further merge
more » ... and propagate information in the temporal domain to infer interactions between hand and object trajectories and recognize actions. The complete model takes as input a sequence of frames and outputs per-frame 3D hand and object pose predictions along with the estimates of object and action categories for the entire sequence. We demonstrate state-of-theart performance of our algorithm even in comparison to the approaches that work on depth data and ground-truth annotations.
doi:10.1109/cvpr.2019.00464 dblp:conf/cvpr/TekinBP19 fatcat:xz7iop75wvdybmqbtyaprvhwru