9 Hits in 1.9 sec

Deep Learning for Visual Tracking: A Comprehensive Survey [article]

Seyed Mojtaba Marvasti-Zadeh, Li Cheng, Hossein Ghanei-Yakhdan, and Shohreh Kasaei
2019 arXiv   pre-print
Visual target tracking is one of the most sought-after yet challenging research topics in computer vision. Given the ill-posed nature of the problem and its popularity in a broad range of real-world scenarios, a number of large-scale benchmark datasets have been established, on which considerable methods have been developed and demonstrated with significant progress in recent years -- predominantly by recent deep learning (DL)-based methods. This survey aims to systematically investigate the
more » ... rent DL-based visual tracking methods, benchmark datasets, and evaluation metrics. It also extensively evaluates and analyzes the leading visual tracking methods. First, the fundamental characteristics, primary motivations, and contributions of DL-based methods are summarized from six key aspects of: network architecture, network exploitation, network training for visual tracking, network objective, network output, and the exploitation of correlation filter advantages. Second, popular visual tracking benchmarks and their respective properties are compared, and their evaluation metrics are summarized. Third, the state-of-the-art DL-based methods are comprehensively examined on a set of well-established benchmarks of OTB2013, OTB2015, VOT2018, and LaSOT. Finally, by conducting critical analyses of these state-of-the-art methods both quantitatively and qualitatively, their pros and cons under various common scenarios are investigated. It may serve as a gentle use guide for practitioners to weigh on when and under what conditions to choose which method(s). It also facilitates a discussion on ongoing issues and sheds light on promising research directions.
arXiv:1912.00535v1 fatcat:v5ikqi2cpbblhgtkiu6z6l5anq

Effective Fusion of Deep Multitasking Representations for Robust Visual Tracking [article]

Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei, Kamal Nasrollahi, Thomas B. Moeslund
2021 arXiv   pre-print
Visual object tracking remains an active research field in computer vision due to persisting challenges with various problem-specific factors in real-world scenes. Many existing tracking methods based on discriminative correlation filters (DCFs) employ feature extraction networks (FENs) to model the target appearance during the learning process. However, using deep feature maps extracted from FENs based on different residual neural networks (ResNets) has not previously been investigated. This
more » ... per aims to evaluate the performance of twelve state-of-the-art ResNet-based FENs in a DCF-based framework to determine the best for visual tracking purposes. First, it ranks their best feature maps and explores the generalized adoption of the best ResNet-based FEN into another DCF-based method. Then, the proposed method extracts deep semantic information from a fully convolutional FEN and fuses it with the best ResNet-based feature maps to strengthen the target representation in the learning process of continuous convolution filters. Finally, it introduces a new and efficient semantic weighting method (using semantic segmentation feature maps on each video frame) to reduce the drift problem. Extensive experimental results on the well-known OTB-2013, OTB-2015, TC-128 and VOT-2018 visual tracking datasets demonstrate that the proposed method effectively outperforms state-of-the-art methods in terms of precision and robustness of visual tracking.
arXiv:2004.01382v2 fatcat:hsns3y46g5c7vdghlmgbzlniji

CHASE: Robust Visual Tracking via Cell-Level Differentiable Neural Architecture Search [article]

Seyed Mojtaba Marvasti-Zadeh, Javad Khaghani, Li Cheng, Hossein Ghanei-Yakhdan, Shohreh Kasaei
2021 arXiv   pre-print
2N CHASE-WO CHASE-S/T SR0.75 (↑) 49.2 51.1 54.3 54.8 56.1 56.5 51.4 45.9 56.1 SR0.5 (↑) 71.7 75.3 73.8 76.7 76.8 78.8 76.5 71.5 76.3 AO (↑) 61.1 63.6 63.4 64.9 65.6 67.0 64.2 60.7 65.6 and CHASE- MARVASTI-ZADEH  ... 
arXiv:2107.03463v2 fatcat:ujzz6flr7nabdp4epxiwkzchby

Video temporal error concealment using improved directional boundary matching algorithm

2016 Turkish Journal of Electrical Engineering and Computer Sciences  
Nowadays some systems such as multimedia systems try to present a high quality of digital videos every day. Because of the possible errors in communication channels, compressed video data would be damaged in the sending process. Error concealment is a useful technique for concealing the effects of sending errors at the decoder. In this paper, an improved directional boundary matching algorithm is presented, in which by using adjacent macroblocks (MBs) of the damaged MB a direction is identified
more » ... for each boundary. Then every boundary of candidate MBs is compared to an identified direction. Finally, the candidate motion vector (MV) that has the minimum improved directional matching function is selected as the MV of the damaged MB. Furthermore, to increase the accuracy in damaged MV estimation, a specific weight is given to the boundaries of adjacent MBs. Conforming to the experimental results, the proposed algorithm not only outperforms subjective visual evaluation compared to classic boundary matching, outer boundary matching, and directional boundary matching algorithms, but also it is able to improve the objective quality of reconstructed video frames effectively.
doi:10.3906/elk-1409-88 fatcat:mmfwjoyeozdtzlu2xwnjxem7cm

Efficient Scale Estimation Methods using Lightweight Deep Convolutional Neural Networks for Visual Tracking [article]

Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
2020 arXiv   pre-print
In recent years, visual tracking methods that are based on discriminative correlation filters (DCF) have been very promising. However, most of these methods suffer from a lack of robust scale estimation skills. Although a wide range of recent DCF-based methods exploit the features that are extracted from deep convolutional neural networks (CNNs) in their translation model, the scale of the visual target is still estimated by hand-crafted features. Whereas the exploitation of CNNs imposes a high
more » ... computational burden, this paper exploits pre-trained lightweight CNNs models to propose two efficient scale estimation methods, which not only improve the visual tracking performance but also provide acceptable tracking speeds. The proposed methods are formulated based on either holistic or region representation of convolutional feature maps to efficiently integrate into DCF formulations to learn a robust scale model in the frequency domain. Moreover, against the conventional scale estimation methods with iterative feature extraction of different target regions, the proposed methods exploit proposed one-pass feature extraction processes that significantly improve the computational efficiency. Comprehensive experimental results on the OTB-50, OTB-100, TC-128 and VOT-2018 visual tracking datasets demonstrate that the proposed visual tracking methods outperform the state-of-the-art methods, effectively.
arXiv:2004.02933v2 fatcat:p3sqisjjpvhbrcyzceknelefnm

Adaptive Exploitation of Pre-trained Deep Convolutional Neural Networks for Robust Visual Tracking [article]

Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
2020 arXiv   pre-print
Due to the automatic feature extraction procedure via multi-layer nonlinear transformations, the deep learning-based visual trackers have recently achieved great success in challenging scenarios for visual tracking purposes. Although many of those trackers utilize the feature maps from pre-trained convolutional neural networks (CNNs), the effects of selecting different models and exploiting various combinations of their feature maps are still not compared completely. To the best of our
more » ... , all those methods use a fixed number of convolutional feature maps without considering the scene attributes (e.g., occlusion, deformation, and fast motion) that might occur during tracking. As a pre-requisition, this paper proposes adaptive discriminative correlation filters (DCF) based on the methods that can exploit CNN models with different topologies. First, the paper provides a comprehensive analysis of four commonly used CNN models to determine the best feature maps of each model. Second, with the aid of analysis results as attribute dictionaries, adaptive exploitation of deep features is proposed to improve the accuracy and robustness of visual trackers regarding video characteristics. Third, the generalization of the proposed method is validated on various tracking datasets as well as CNN models with similar architectures. Finally, extensive experimental results demonstrate the effectiveness of the proposed adaptive method compared with state-of-the-art visual tracking methods.
arXiv:2008.13015v2 fatcat:r5b6kd3gfvcadf7xbviaxpd4zi

COMET: Context-Aware IoU-Guided Network for Small Object Tracking [article]

Seyed Mojtaba Marvasti-Zadeh, Javad Khaghani, Hossein Ghanei-Yakhdan, Shohreh Kasaei, Li Cheng
2020 arXiv   pre-print
We consider the problem of tracking an unknown small target from aerial videos of medium to high altitudes. This is a challenging problem, which is even more pronounced in unavoidable scenarios of drastic camera motion and high density. To address this problem, we introduce a context-aware IoU-guided tracker (COMET) that exploits a multitask two-stream network and an offline reference proposal generation strategy. The proposed network fully exploits target-related information by multi-scale
more » ... ure learning and attention modules. The proposed strategy introduces an efficient sampling strategy to generalize the network on the target and its parts without imposing extra computational complexity during online tracking. These strategies contribute considerably in handling significant occlusions and viewpoint changes. Empirically, COMET outperforms the state-of-the-arts in a range of aerial view datasets that focusing on tracking small objects. Specifically, COMET outperforms the celebrated ATOM tracker by an average margin of 6.2% (and 7%) in precision (and success) score on challenging benchmarks of UAVDT, VisDrone-2019, and Small-90.
arXiv:2006.02597v3 fatcat:m2jmlmdryvbplfv6xkhgm6jxzi

A Novel Boundary Matching Algorithm for Video Temporal Error Concealment

Seyed Mojtaba Marvasti Zadeh, Hossein Ghanei Yakhdan, Shohreh Kasaei
2014 International Journal of Image Graphics and Signal Processing  
With the fast growth of communication networks, the video data transmission from these networks is extremely vulnerable. Error concealment is a technique to estimate the damaged data by employing the correctly received data at the decoder. In this paper, an efficient boundary matching algorithm for estimating damaged motion vectors (MVs) is proposed. The proposed algorithm performs error concealment for each damaged macro block (MB) according to the list of identified priority of each frame. It
more » ... then uses a classic boundary matching criterion or the proposed boundary matching criterion adaptively to identify matching distortion in each boundary of candidate MB. Finally, the candidate MV with minimum distortion is selected as an MV of damaged MB and the list of priorities is updated. Experimental results show that the proposed algorithm improves both objective and subjective qualities of reconstructed frames without any significant increase in computational cost. The PSNR for test sequences in some frames is increased about 4.7, 4.5, and 4.4 dB compared to the classic boundary matching, directional boundary matching, and directional temporal boundary matching algorithm, respectively.
doi:10.5815/ijigsp.2014.06.01 fatcat:domkp6pwabhhxosvjd7cunbbgu

Beyond Background-Aware Correlation Filters: Adaptive Context Modeling by Hand-Crafted and Deep RGB Features for Visual Tracking [article]

Seyed Mojtaba Marvasti-Zadeh, Hossein Ghanei-Yakhdan, Shohreh Kasaei
2021 arXiv   pre-print
Mojtaba Marvasti-Zadeh et al.  ...  0.676 0.576 0.724 0.486 Staple 0.729 0.659 0.622 0.671 0.671 0.677 0.665 0.661 0.570 0.709 0.483 SRDCF 0.742 0.663 0.674 0.680 0.667 0.747 0.722 0.657 0.558 0.701 0.604 Seyed  ... 
arXiv:2004.02932v2 fatcat:dbjgzsequvcpxgq2qlilv6rmbi