Filters








7,282 Hits in 4.8 sec

Learning Spatio-Appearance Memory Network for High-Performance Visual Tracking [article]

Fei Xie, Wankou Yang, Bo Liu, Kaihua Zhang, Wanli Xue, Wangmeng Zuo
2021 arXiv   pre-print
To overcome this issue, this paper presents a novel segmentation-based tracking architecture, which is equipped with a spatio-appearance memory network to learn accurate spatio-temporal correspondence.  ...  Among it, an appearance memory network explores spatio-temporal non-local similarity to learn the dense correspondence between the segmentation mask and the current frame.  ...  In our approach, we combine AMN and SMN to constitute the spatio-appearance memory network for improving segmentation and tracking performance.  ... 
arXiv:2009.09669v5 fatcat:cmp2pz4htjbcvkg2j4a6zxzhne

Learning Video Object Segmentation with Visual Memory [article]

Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
2017 arXiv   pre-print
Given a video frame as input, our approach assigns each pixel an object or background label based on the learned spatio-temporal features as well as the "visual memory" specific to the video, acquired  ...  We introduce a novel two-stream neural network with an explicit memory module to achieve this.  ...  We gratefully acknowledge the support of NVIDIA with the donation of GPUs used for this research.  ... 
arXiv:1704.05737v2 fatcat:aoyzqp2bdrctld6a5n6et3tzby

Learning Video Object Segmentation with Visual Memory

Pavel Tokmakov, Karteek Alahari, Cordelia Schmid
2017 2017 IEEE International Conference on Computer Vision (ICCV)  
Given a video frame as input, our approach assigns each pixel an object or background label based on the learned spatio-temporal features as well as the "visual memory" specific to the video, acquired  ...  We introduce a novel two-stream neural network with an explicit memory module to achieve this.  ...  We gratefully acknowledge NVIDIAs support with the donation of GPUs used for this work.  ... 
doi:10.1109/iccv.2017.480 dblp:conf/iccv/TokmakovAS17 fatcat:isiusnapercwbgyasryh5jcufa

Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking [article]

Guanghan Ning, Zhi Zhang, Chen Huang, Zhihai He, Xiaobo Ren, Haohong Wang
2016 arXiv   pre-print
In this paper, we develop a new approach of spatially supervised recurrent convolutional neural networks for visual object tracking.  ...  Our recurrent convolutional network exploits the history of locations as well as the distinctive visual features learned by the deep neural networks.  ...  We will focus on efficient online learning, in order to maintain high performance while tracking an object in unseen dynamics with real-time performance.  ... 
arXiv:1607.05781v1 fatcat:o3qgs5ggvvbnfeyw3qnung7gyi

Robust Long-Term Object Tracking via Improved Discriminative Model Prediction [article]

Seokeon Choi, Junhyun Lee, Yunsung Lee, Alexander Hauptmann
2020 arXiv   pre-print
We propose an improved discriminative model prediction method for robust long-term tracking based on a pre-trained short-term tracker.  ...  And then, we correct the tracking state of our model accordingly. (2) Random search with spatio-temporal constraints: we propose a robust random search method with a score penalty applied to prevent the  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copy-right annotation/herein.  ... 
arXiv:2008.04722v2 fatcat:lsbo6z5kcfglfcn76gdkg353mq

AFAT: Adaptive Failure-Aware Tracker for Robust Visual Object Tracking [article]

Tianyang Xu, Zhen-Hua Feng, Xiao-Jun Wu, Josef Kittler
2020 arXiv   pre-print
Siamese approaches have achieved promising performance in visual object tracking recently.  ...  The key to the success of Siamese trackers is to learn appearance-invariant feature embedding functions via pair-wise offline training on large-scale video datasets.  ...  Therefore, the learned model enables high tracking accuracy in the test videos where the static and dynamic appearance variations are similar to those found in the training stage.  ... 
arXiv:2005.13708v1 fatcat:zo55gjj5krcdth6gbbagdpksbq

Prototypical Cross-Attention Networks for Multiple Object Tracking and Segmentation [article]

Lei Ke, Xia Li, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
2021 arXiv   pre-print
We propose Prototypical Cross-Attention Network (PCAN), capable of leveraging rich spatio-temporal information for online multiple object tracking and segmentation.  ...  To segment each object, PCAN adopts a prototypical appearance module to learn a set of contrastive foreground and background prototypes, which are then propagated over time.  ...  TrackFormer [25] performs joint object detection and tracking by recurrently using Transformers, while Stem-Seg [1] adopts a short 3D convolutional spatio-temporal volume to learn pixel embedding by  ... 
arXiv:2106.11958v2 fatcat:d5ohmgpobjgtfd47y5l6tdj2dy

Multiple People Tracking Using Hierarchical Deep Tracklet Re-identification [article]

Maryam Babaee, Ali Athar, Gerhard Rigoll
2018 arXiv   pre-print
To this end, tracklet re-identification is performed by utilizing a novel multi-stage deep network that can jointly reason about the visual appearance and spatio-temporal properties of a pair of tracklets  ...  By contrast, tracklet (a sequence of detections) re-identification can improve association accuracy since tracklets offer a richer set of visual appearance and spatio-temporal cues.  ...  This network consists of a CNN that learns pairwise detection visual appearance, and two bidirectional RNNs that learn spatio-temporal features, and aggregate visual and spatio-temporal features, respectively  ... 
arXiv:1811.04091v2 fatcat:qudujsjcpzd6tday6xtlzqvlkm

Region Aware Video Object Segmentation with Deep Motion Modeling [article]

Bo Miao and Mohammed Bennamoun and Yongsheng Gao and Ajmal Mian
2022 arXiv   pre-print
For efficient memory storage, we propose motion path memory to filter out redundant context by memorizing the features within the motion path of objects between two frames.  ...  To reduce redundancy, we present a Region Aware Video Object Segmentation (RAVOS) approach that predicts regions of interest (ROIs) for efficient object segmentation and memory storage.  ...  DMMNet [45] leverages spatio-temporal features to predict tracklets for tracking. TT17 [46] proposes an iterative clustering method to generate multiple high confidence tracklets for objects.  ... 
arXiv:2207.10258v1 fatcat:gmnhjhwa2zgr3oon63oeoxmuja

ST-MTL: Spatio-Temporal Multitask Learning Model to Predict Scanpath While Tracking Instruments in Robotic Surgery [article]

Mobarakol Islam, Vibashan VS, Chwee Ming Lim, Hongliang Ren
2021 arXiv   pre-print
To capture better long term spatio-temporal dependencies, we enhance the long-short term memory (LSTM) module by concatenating high-level encoder features of consecutive frames.  ...  We propose an end-to-end trainable Spatio-Temporal Multi-Task Learning (ST-MTL) model with a shared encoder and spatio-temporal decoders for the real-time surgical instrument segmentation and task-oriented  ...  To learn better spatio-temporal correlation we concatenate high level encoder feature of present frame (X in ) with previous frame (Xe t−1 ).  ... 
arXiv:2112.08189v1 fatcat:ck7rypz2lnfbplc4z3kejcjzae

Spatio-temporal feature based deep neural network for cell lineage analysis in microscopy images [article]

Siteng Chen, Andrew L. Paek, Kathleen A. Lasick, Suvithanandhini Loganathan, Janet Roveda, Ao Li
2021 bioRxiv   pre-print
They can learn complex visual features, capture long-range temporal dependencies, and have the potential to be used for automatic cell lineage analysis.  ...  Methods: In this study, we propose a multi-task spatio-temporal feature based deep neural network for cell lineages analysis (Cell-STN).  ...  Specifically, we designed the novel Cell-STN, including a spatio-temporal core network for shared cell features regarding appearance and activities and the task-specific networks for the tracking, mitosis  ... 
doi:10.1101/2021.08.28.457873 fatcat:no73jqcil5f6xfg4u6tz227oxy

Saliency Tubes: Visual Explanations for Spatio-Temporal Convolutions [article]

Alexandros Stergiou, Georgios Kapidis, Grigorios Kalliatakis, Christos Chrysoulas, Remco Veltkamp, Ronald Poppe
2019 arXiv   pre-print
Deep learning approaches have been established as the main methodology for video classification and recognition.  ...  We demonstrate our findings on widely used datasets for third-person and egocentric action classification and enhance the set of methods and visualizations that improve 3D Convolutional Neural Networks  ...  While there has been promising progress in the context of these 'visual explanations' for 2D CNNs, visualizing learned features of 3D convolutions, where the networks have access to not only the appearance  ... 
arXiv:1902.01078v2 fatcat:ejc7useqonbalb6fnozbv64nby

Learning to Focus and Track Extreme Climate Events

Sookyung Kim, Sunghyun Park, Sunghyo Chung, Joonseok Lee, Yunsung Lee, Hyojin Kim, Prabhat, Jaegul Choo
2019 British Machine Vision Conference  
It has unique challenges compared to other visual object tracking problems, including a wider range of spatio-temporal dynamics, the unclear boundary of the target, and the shortage of a labeled dataset  ...  It first learns to imprint the location and the appearance of the target at the first frame in an auto-encoding fashion.  ...  This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Lab. under contract DE-AC52-07NA27344(LLNL-CONF-776815).  ... 
dblp:conf/bmvc/KimPCLLKPC19 fatcat:c5vwzjhrynfqrmwhuufhjcaase

Glimpse Clouds: Human Activity Recognition from Unstructured Feature Points

Fabien Baradel, Christian Wolf, Julien Mille, Graham W. Taylor
2018 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition  
These workers receive glimpses, jointly performing subsequent motion tracking and activity prediction.  ...  Instead, a visual attention module learns to predict glimpse sequences in each frame. These glimpses correspond to interest points in the scene that are relevant to the classified activities.  ...  In [53] , visual memory is used to learn a spatio-temporal representation of moving objects in a scene. Memory is implemented as a convolutional GRU with a 2D spatial hidden state.  ... 
doi:10.1109/cvpr.2018.00056 dblp:conf/cvpr/Baradel0MT18 fatcat:g4jzqta3hfa4bmyr5jqprqwp4a

2019 Index IEEE Transactions on Circuits and Systems for Video Technology Vol. 29

2019 IEEE transactions on circuits and systems for video technology (Print)  
., +, TCSVT Oct. 2019 2941-2959 Revisiting Jump-Diffusion Process for Visual Tracking: A Reinforcement Learning Approach.  ...  Kamisli, F., TCSVT Feb. 2019 502-516 Diffusion Revisiting Jump-Diffusion Process for Visual Tracking: A Reinforcement Learning Approach.  ... 
doi:10.1109/tcsvt.2019.2959179 fatcat:2bdmsygnonfjnmnvmb72c63tja
« Previous Showing results 1 — 15 out of 7,282 results