VLAD3: Encoding Dynamics of Deep Features for Action Recognition

Yingwei Li, Weixin Li, Vijay Mahadevan, Nuno Vasconcelos
2016 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
Previous approaches to action recognition with deep features tend to process video frames only within a small temporal region, and do not model long-range dynamic information explicitly. However, such information is important for the accurate recognition of actions, especially for the discrimination of complex activities that share sub-actions, and when dealing with untrimmed videos. Here, we propose a representation, VLAD for Deep Dynamics (VLAD 3 ), that accounts for different levels of video
more » ... dynamics. It captures short-term dynamics with deep convolutional neural network features, relying on linear dynamic systems (LDS) to model medium-range dynamics. To account for long-range inhomogeneous dynamics, a VLAD descriptor is derived for the LDS and pooled over the whole video, to arrive at the final VLAD 3 representation. An extensive evaluation was performed on Olympic Sports, UCF101 and THUMOS15, where the use of the VLAD 3 representation leads to stateof-the-art results.
doi:10.1109/cvpr.2016.215 dblp:conf/cvpr/LiLMV16 fatcat:25edszc4gbcy3pddlnmvcwf4la