A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Deep Learning for Visual Tracking: A Comprehensive Survey
[article]
2019
arXiv
pre-print
Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years -- predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the
arXiv:1912.00535v1
fatcat:v5ikqi2cpbblhgtkiu6z6l5anq
more »
... rent DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from six key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, and the exploitation of correlation filter advantages. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, and LaSOT. Finally, by conducting critical analyses of these state-of-the-art methods both quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh on when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.
Effective Fusion of Deep Multitasking Representations for Robust Visual Tracking
[article]
2021
arXiv
pre-print
Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. However, using deep feature maps extracted from FENs based on different residual neural networks (ResNets) has not previously been investigated. This
arXiv:2004.01382v2
fatcat:hsns3y46g5c7vdghlmgbzlniji
more »
... per aims to evaluate the performance of twelve state-of-the-art ResNet-based FENs in a DCF-based framework to determine the best for visual tracking purposes. First, it ranks their best feature maps and explores the generalized adoption of the best ResNet-based FEN into another DCF-based method. Then, the proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation in the learning process of continuous convolution filters. Finally, it introduces a new and efficient semantic weighting method (using semantic segmentation feature maps on each video frame) to reduce the drift problem. Extensive experimental results on the well-known OTB-2013, OTB-2015, TC-128 and VOT-2018 visual tracking datasets demonstrate that the proposed method effectively outperforms state-of-the-art methods in terms of precision and robustness of visual tracking.
CHASE: Robust Visual Tracking via Cell-Level Differentiable Neural Architecture Search
[article]
2021
arXiv
pre-print
2N CHASE-WO CHASE-S/T SR0.75 (↑) 49.2 51.1 54.3 54.8 56.1 56.5 51.4 45.9 56.1 SR0.5 (↑) 71.7 75.3 73.8 76.7 76.8 78.8 76.5 71.5 76.3 AO (↑) 61.1 63.6 63.4 64.9 65.6 67.0 64.2 60.7 65.6 and CHASE-
MARVASTI-ZADEH ...
arXiv:2107.03463v2
fatcat:ujzz6flr7nabdp4epxiwkzchby
Video temporal error concealment using improved directional boundary matching algorithm
2016
Turkish Journal of Electrical Engineering and Computer Sciences
Nowadays some systems such as multimedia systems try to present a high quality of digital videos every day. Because of the possible errors in communication channels, compressed video data would be damaged in the sending process. Error concealment is a useful technique for concealing the effects of sending errors at the decoder. In this paper, an improved directional boundary matching algorithm is presented, in which by using adjacent macroblocks (MBs) of the damaged MB a direction is identified
doi:10.3906/elk-1409-88
fatcat:mmfwjoyeozdtzlu2xwnjxem7cm
more »
... for each boundary. Then every boundary of candidate MBs is compared to an identified direction. Finally, the candidate motion vector (MV) that has the minimum improved directional matching function is selected as the MV of the damaged MB. Furthermore, to increase the accuracy in damaged MV estimation, a specific weight is given to the boundaries of adjacent MBs. Conforming to the experimental results, the proposed algorithm not only outperforms subjective visual evaluation compared to classic boundary matching, outer boundary matching, and directional boundary matching algorithms, but also it is able to improve the objective quality of reconstructed video frames effectively.
Efficient Scale Estimation Methods using Lightweight Deep Convolutional Neural Networks for Visual Tracking
[article]
2020
arXiv
pre-print
In recent years, visual tracking methods that are based on discriminative correlation filters (DCF) have been very promising. However, most of these methods suffer from a lack of robust scale estimation skills. Although a wide range of recent DCF-based methods exploit the features that are extracted from deep convolutional neural networks (CNNs) in their translation model, the scale of the visual target is still estimated by hand-crafted features. Whereas the exploitation of CNNs imposes a high
arXiv:2004.02933v2
fatcat:p3sqisjjpvhbrcyzceknelefnm
more »
... computational burden, this paper exploits pre-trained lightweight CNNs models to propose two efficient scale estimation methods, which not only improve the visual tracking performance but also provide acceptable tracking speeds. The proposed methods are formulated based on either holistic or region representation of convolutional feature maps to efficiently integrate into DCF formulations to learn a robust scale model in the frequency domain. Moreover, against the conventional scale estimation methods with iterative feature extraction of different target regions, the proposed methods exploit proposed one-pass feature extraction processes that significantly improve the computational efficiency. Comprehensive experimental results on the OTB-50, OTB-100, TC-128 and VOT-2018 visual tracking datasets demonstrate that the proposed visual tracking methods outperform the state-of-the-art methods, effectively.
Adaptive Exploitation of Pre-trained Deep Convolutional Neural Networks for Robust Visual Tracking
[article]
2020
arXiv
pre-print
Due to the automatic feature extraction procedure via multi-layer nonlinear transformations, the deep learning-based visual trackers have recently achieved great success in challenging scenarios for visual tracking purposes. Although many of those trackers utilize the feature maps from pre-trained convolutional neural networks (CNNs), the effects of selecting different models and exploiting various combinations of their feature maps are still not compared completely. To the best of our
arXiv:2008.13015v2
fatcat:r5b6kd3gfvcadf7xbviaxpd4zi
more »
... , all those methods use a fixed number of convolutional feature maps without considering the scene attributes (e.g., occlusion, deformation, and fast motion) that might occur during tracking. As a pre-requisition, this paper proposes adaptive discriminative correlation filters (DCF) based on the methods that can exploit CNN models with different topologies. First, the paper provides a comprehensive analysis of four commonly used CNN models to determine the best feature maps of each model. Second, with the aid of analysis results as attribute dictionaries, adaptive exploitation of deep features is proposed to improve the accuracy and robustness of visual trackers regarding video characteristics. Third, the generalization of the proposed method is validated on various tracking datasets as well as CNN models with similar architectures. Finally, extensive experimental results demonstrate the effectiveness of the proposed adaptive method compared with state-of-the-art visual tracking methods.
COMET: Context-Aware IoU-Guided Network for Small Object Tracking
[article]
2020
arXiv
pre-print
We consider the problem of tracking an unknown small target from aerial videos of medium to high altitudes. This is a challenging problem, which is even more pronounced in unavoidable scenarios of drastic camera motion and high density. To address this problem, we introduce a context-aware IoU-guided tracker (COMET) that exploits a multitask two-stream network and an offline reference proposal generation strategy. The proposed network fully exploits target-related information by multi-scale
arXiv:2006.02597v3
fatcat:m2jmlmdryvbplfv6xkhgm6jxzi
more »
... ure learning and attention modules. The proposed strategy introduces an efficient sampling strategy to generalize the network on the target and its parts without imposing extra computational complexity during online tracking. These strategies contribute considerably in handling significant occlusions and viewpoint changes. Empirically, COMET outperforms the state-of-the-arts in a range of aerial view datasets that focusing on tracking small objects. Specifically, COMET outperforms the celebrated ATOM tracker by an average margin of 6.2% (and 7%) in precision (and success) score on challenging benchmarks of UAVDT, VisDrone-2019, and Small-90.
A Novel Boundary Matching Algorithm for Video Temporal Error Concealment
2014
International Journal of Image Graphics and Signal Processing
With the fast growth of communication networks, the video data transmission from these networks is extremely vulnerable. Error concealment is a technique to estimate the damaged data by employing the correctly received data at the decoder. In this paper, an efficient boundary matching algorithm for estimating damaged motion vectors (MVs) is proposed. The proposed algorithm performs error concealment for each damaged macro block (MB) according to the list of identified priority of each frame. It
doi:10.5815/ijigsp.2014.06.01
fatcat:domkp6pwabhhxosvjd7cunbbgu
more »
... then uses a classic boundary matching criterion or the proposed boundary matching criterion adaptively to identify matching distortion in each boundary of candidate MB. Finally, the candidate MV with minimum distortion is selected as an MV of damaged MB and the list of priorities is updated. Experimental results show that the proposed algorithm improves both objective and subjective qualities of reconstructed frames without any significant increase in computational cost. The PSNR for test sequences in some frames is increased about 4.7, 4.5, and 4.4 dB compared to the classic boundary matching, directional boundary matching, and directional temporal boundary matching algorithm, respectively.
Beyond Background-Aware Correlation Filters: Adaptive Context Modeling by Hand-Crafted and Deep RGB Features for Visual Tracking
[article]
2021
arXiv
pre-print
Mojtaba Marvasti-Zadeh et al. ...
0.676
0.576
0.724
0.486
Staple
0.729
0.659
0.622
0.671
0.671
0.677
0.665
0.661
0.570
0.709
0.483
SRDCF
0.742
0.663
0.674
0.680
0.667
0.747
0.722
0.657
0.558
0.701
0.604
Seyed ...
arXiv:2004.02932v2
fatcat:dbjgzsequvcpxgq2qlilv6rmbi