Converting video classification problem to image classification with global descriptors and pre-trained network

Saeedeh Zebhi, SMT Al-Modarresi, Vahid Abootalebi
2020 IET Computer Vision  
Motion history image (MHI) is a spatio-temporal template that temporal motion information is collapsed into a single image where intensity is a function of recency of motion. Also, it consists of spatial information. Energy image (EI) based on the magnitude of optical flow is a temporal template that shows only temporal information of motion. Each video can be described in these templates. So, four new methods are introduced in this study. The first three methods are called basic methods. In
more » ... hod 1, each video splits into N groups of consecutive frames and MHI is calculated for each group. Transfer learning with fine-tuning technique has been used for classifying these templates. EIs are used for classifying in method 2 similar to method 1. Fusing two streams of these templates is introduced as method 3. Finally, spatial information is added in method 4. Among these methods, method 4 outperforms others and it is called the proposed method. It achieves the recognition accuracy of 92.30 and 94.50% for UCF Sport and UCF-11 action data sets, respectively. Also, the proposed method is compared with the state-of-theart approaches and the results show that it has the best performance.
doi:10.1049/iet-cvi.2019.0625 fatcat:zwmndlydmbb7xbfstz63eew274