A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks
2017
2017 IEEE International Conference on Computer Vision (ICCV)
Convolutional Neural Networks (CNN) have been regarded as a powerful class of models for image recognition problems. Nevertheless, it is not trivial when utilizing a CNN for learning spatio-temporal video representation. A few studies have shown that performing 3D convolutions is a rewarding approach to capture both spatial and temporal dimensions in videos. However, the development of a very deep 3D CNN from scratch results in expensive computational cost and memory demand. A valid question is
doi:10.1109/iccv.2017.590
dblp:conf/iccv/QiuYM17
fatcat:zlpfokrdvbfdbko5hn7epmdtju