Need for Speed: A Benchmark for Higher Frame Rate Object Tracking
2017 IEEE International Conference on Computer Vision (ICCV)
In this paper, we propose the first higher frame rate video dataset (called Need for Speed -NfS) and benchmark for visual object tracking. The dataset consists of 100 videos (380K frames) captured with now commonly available higher frame rate (240 FPS) cameras from real world scenarios. All frames are annotated with axis aligned bounding boxes and all sequences are manually labelled with nine visual attributes -such as occlusion, fast motion, background clutter, etc. Our benchmark provides an
... tensive evaluation of many recent and state-of-the-art trackers on higher frame rate sequences. We ranked each of these trackers according to their tracking accuracy and real-time performance. One of our surprising conclusions is that at higher frame rates, simple trackers such as correlation filters outperform complex methods based on deep networks. This suggests that for practical applications (such as in robotics or embedded vision), one needs to carefully tradeoff bandwidth constraints associated with higher frame rate acquisition, computational costs of real-time analysis, and the required application accuracy. Our dataset and benchmark allows for the first time (to our knowledge) systematic exploration of such issues, and will be made available to allow for further research in this space. Recent trackers can be generally divided into two categories, including correlation filter (CF) trackers [1, 13, 7, 23, 9] and deep trackers [26, 2, 34, 31] . We briefly review each of these two categories as following.