STResNet_CF Tracker: The deep spatiotemporal features learning for correlation filter based robust visual object tracking

Zhengyu Zhu, Bing Liu, Yunbo Rao, Qiao Liu, Rui Zhang
2019 IEEE Access  
Constructing a robust appearance model of the visual object is a crucial task for visual object tracking. Recently, more and more studies combine spatial feature with a temporal feature to improve the tracking performance. These methods successfully apply the features from spatial and temporal to address the problem for tracking. This paper presents a novel method for visual object tracking based on spatiotemporal feature combined with correlation filters. In this paper, the visual features of
more » ... target object are extracted from a spatial-temporal residual network (STResNet) appearance model with two sub-networks. The STResNet appearance model learns separately spatial feature and temporal feature, respectively, so that we can effectively utilize spatial context around the surrounding of the target object in each frame and the temporal relationship between successive frames to refine the appearance representation of the target object. Finally, our spatiotemporal fusion feature from STResNet appearance model is incorporated into the correlation filter for robust visual object tracking. The experimental results show that our method achieves similar or better performance compared with the other tracking methods based on convolutional neural networks or correlation filter. INDEX TERMS Spatiotemporal residual network, correlation filter, visual object tracking, deep learning, convolutional neural networks. 30142 2169-3536
doi:10.1109/access.2019.2903161 fatcat:uas76gauobcspkjwvuvy6wpevi