A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Spatio-temporal Tubelet Feature Aggregation and Object Linking in Videos
[article]
2020
arXiv
pre-print
This paper addresses the problem of how to exploit spatio-temporal information available in videos to improve the object detection precision. We propose a two stage object detector called FANet based on short-term spatio-temporal feature aggregation to give a first detection set, and long-term object linking to refine these detections. Firstly, we generate a set of short tubelet proposals containing the object in N consecutive frames. Then, we aggregate RoI pooled deep features through the
arXiv:2004.00451v2
fatcat:oxflf7wvubbzjda23gffmd7u5u