stagNet: An Attentive Semantic RNN for Group Activity and Individual Action Recognition

Mengshi Qi, Yunhong Wang, Jie Qin, Annan Li, Jiebo Luo, Luc Van Gool
2019 IEEE transactions on circuits and systems for video technology (Print)  
In real life, group activity recognition plays a significant and fundamental role in a variety of applications, e.g. sports video analysis, abnormal behavior detection and intelligent surveillance. In a complex dynamic scene, a crucial yet challenging issue is how to better model the spatio-temporal contextual information and inter-person relationship. In the paper, we present a novel attentive semantic recurrent neural network (RNN), namely stagNet, for understanding group activities and
more » ... dual actions in videos, by combining the spatio-temporal attention mechanism and semantic graph modeling. Specifically, a structured semantic graph is explicitly modeled to express the spatial contextual content of the whole scene, which is afterward further incorporated with the temporal factor through structural-RNN. By virtue of the 'factor sharing' and 'message passing' mechanisms, our stagNet is capable of extracting discriminative and informative spatio-temporal representations and capturing inter-person relationships. Moreover, we adopt a spatio-temporal attention model to focus on key persons/frames for improved recognition performance. Besides, a body-region attention and a global-part feature pooling strategy are devised for individual action recognition. In experiments, four widely-used public datasets are adopted for performance evaluation, and the extensive results demonstrate the superiority and effectiveness of our method.
doi:10.1109/tcsvt.2019.2894161 fatcat:wcjvyo3wgfbsfcew4x62sw6cfi