Depth2Action: Exploring Embedded Depth for Large-Scale Action Recognition [article]

Yi Zhu, Shawn Newsam
2016 arXiv   pre-print
This paper performs the first investigation into depth for large-scale human action recognition in video where the depth cues are estimated from the videos themselves. We develop a new framework called depth2action and experiment thoroughly into how best to incorporate the depth information. We introduce spatio-temporal depth normalization (STDN) to enforce temporal consistency in our estimated depth sequences. We also propose modified depth motion maps (MDMM) to capture the subtle temporal
more » ... ges in depth. These two components significantly improve the action recognition performance. We evaluate our depth2action framework on three large-scale action recognition video benchmarks. Our model achieves state-of-the-art performance when combined with appearance and motion information thus demonstrating that depth2action is indeed complementary to existing approaches.
arXiv:1608.04339v1 fatcat:eaby7yy5uzfw7huk3ftrk2k7fm