First Person Action Recognition Using Deep Learned Descriptors

Suriya Singh, Chetan Arora, C. V. Jawahar
2016 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We focus on the problem of wearer's action recognition in first person a.k.a. egocentric videos. This problem is more challenging than third person activity recognition due to unavailability of wearer's pose and sharp movements in the videos caused by the natural head motion of the wearer. Carefully crafted features based on hands and objects cues for the problem have been shown to be successful for limited targeted datasets. We propose convolutional neural networks (CNNs) for end to end
more » ... g and classification of wearer's actions. The proposed network makes use of egocentric cues by capturing hand pose, head motion and saliency map. It is compact. It can also be trained from relatively small number of labeled egocentric videos that are available. We show that the proposed network can generalize and give state of the art performance on various disparate egocentric action datasets.
doi:10.1109/cvpr.2016.287 dblp:conf/cvpr/SinghAJ16 fatcat:xy7qjd3cvbgydmchfwv2upz3ju