Filters








46,713 Hits in 5.7 sec

Associating Objects with Transformers for Video Object Segmentation [article]

Zongxin Yang, Yunchao Wei, Yi Yang
2021 arXiv   pre-print
The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing  ...  This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios.  ...  Revisit Previous Solutions for Video Object Segmentation In VOS, many common video scenarios have multiple targets or objects required for tracking and segmenting.  ... 
arXiv:2106.02638v3 fatcat:mwhmxpp2u5dmplxyquvkin4y7i

WeClick: Weakly-Supervised Video Semantic Segmentation with Click Annotations [article]

Peidong Liu, Zibin He, Xiyu Yan, Yong Jiang, Shutao Xia, Feng Zheng, Maowei Hu
2021 arXiv   pre-print
However, applying clicks to learn video semantic segmentation model has not been explored before.  ...  In this case, WeClick learns compact video semantic segmentation models with the low-cost click annotations during the training phase yet achieves real-time and accurate models during the inference period  ...  In order to promote the development of this field, we extend the classical click-based learning method [40] in image semantic segmentation to video semantic segmentation for learning semantic information  ... 
arXiv:2107.03088v2 fatcat:ygkvbebusvchhcdxu63wmmc2ry

TTVOS: Lightweight Video Object Segmentation with Adaptive Template Attention Module and Temporal Consistency Loss [article]

Hyojin Park, Ganesh Venkatesh, Nojun Kwak
2021 arXiv   pre-print
For doing this, various approaches have been developed based on online-learning, memory networks, and optical flow.  ...  Semi-supervised video object segmentation (semi-VOS) is widely used in many applications. This task is tracking class-agnostic objects from a given target mask.  ...  DAVIS16 is a single object task consisting of 30 training videos and 20 validation videos, and DAVIS17 is a multiple object task with 60 training videos and 30 validation videos.  ... 
arXiv:2011.04445v3 fatcat:cdv6mnp6bfbpraeaekkuo6vtii

Adaptive Template and Transition Map for Real-time Video Object Segmentation

Hyojin Park, Jayeon Yoo, Ganesh Venkatesh, Nojun Kwak
2021 IEEE Access  
INDEX TERMS Semi-supervised video object segmentation, video object segmentation, video object tracking, deep learning. FIGURE 1.  ...  Semi-supervised video object segmentation (semi-VOS) is required for many visual applications. This task is tracking class-agnostic objects from a given segmentation mask.  ...  INTRODUCTION Video object segmentation (VOS) is an essential technique to precisely identify the shape of target objects under various conditions in every video frame.  ... 
doi:10.1109/access.2021.3106353 fatcat:4o6tiguhmbf2ldffqte3kbfk7e

TransVOS: Video Object Segmentation with Transformers [article]

Jianbiao Mei, Mengmeng Wang, Yeneng Lin, Yi Yuan, Yong Liu
2021 arXiv   pre-print
Recently, Space-Time Memory Network (STM) based methods have achieved state-of-the-art performance in semi-supervised video object segmentation (VOS).  ...  In this paper, we propose a new transformer-based framework, termed TransVOS, introducing a vision transformer to fully exploit and model both the temporal and spatial relationships.  ...  The latter is the relationships among pixels in one specific frame, including object appearance information for target localization and segmentation, which is important for learning local target object  ... 
arXiv:2106.00588v2 fatcat:gwnnrnybcjge7km6pc7denohwe

Associating Objects with Scalable Transformers for Video Object Segmentation [article]

Zongxin Yang, Jiaxu Miao, Xiaohan Wang, Yunchao Wei, Yi Yang
2022 arXiv   pre-print
The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computation  ...  This paper investigates how to realize better and more efficient embedding learning to tackle the semi-supervised video object segmentation under challenging multi-object scenarios.  ...  REVISIT PREVIOUS VOS SOLUTIONS In VOS, many common video scenarios have multiple targets or objects required for tracking and segmenting.  ... 
arXiv:2203.11442v4 fatcat:3oai7pgkrffv3nphb6cvxdwtee

Video Object Segmentation with Episodic Graph Memory Networks [article]

Xiankai Lu, Wenguan Wang, Martin Danelljan, Tianfei Zhou, Jianbing Shen, Luc Van Gool
2020 arXiv   pre-print
How to make a segmentation model efficiently adapt to a specific video and to online target appearance variations are fundamentally crucial issues in the field of video object segmentation.  ...  learn an abstract method for storing useful representations in the memory and how to later use these representations for prediction, via gradient descent.  ...  Introduction Video object segmentation (VOS), as a core task in computer vision, aims to predict the target object in a video at the pixel level.  ... 
arXiv:2007.07020v4 fatcat:poacfhhxmzbpzk5jxhyd5awvx4

Robust Visual Tracking by Segmentation [article]

Matthieu Paul, Martin Danelljan, Christoph Mayer, Luc Van Gool
2022 arXiv   pre-print
Instead, we validate our segmentation quality on two popular video object segmentation datasets.  ...  Estimating the target extent poses a fundamental challenge in visual object tracking. Typically, trackers are box-centric and fully rely on a bounding box to define the target in the scene.  ...  Acknowledgements This work was partly supported by uniqFEED AG and the ETH Future Computing Laboratory (EFCL) financed by a gift from Huawei Technologies.  ... 
arXiv:2203.11191v2 fatcat:yjtq44v7mfbyzfhamsxobx3djy

Memory Aggregation Networks for Efficient Interactive Video Object Segmentation [article]

Jiaxu Miao, Yunchao Wei, Yi Yang
2020 arXiv   pre-print
Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions.  ...  Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively, leading to inefficiencies during the inference stage  ...  Acknowledgments This work is in part supported by ARC DP200100938 and ARC DECRA DE190101315.  ... 
arXiv:2003.13246v1 fatcat:rrsns4mdtrdnhceuhnorbcunse

Learning Dynamic Compact Memory Embedding for Deformable Visual Object Tracking [article]

Pengfei Zhu, Hongtao Yu, Kaihua Zhang, Yu Wang, Shuai Zhao, Lei Wang, Tianzhu Zhang, Qinghua Hu
2021 arXiv   pre-print
To further improve the segmentation accuracy for deformable objects, we employ a point-to-global matching strategy to measure the correlation between the pixel-wise query features and the whole template  ...  Besides, our method outperforms the excellent segmentation-based trackers, i.e., D3S and SiamMask on DAVIS2017 benchmark.  ...  Index Terms-Visual object tracking, compact memory, deformable feature, video object segmentation I.  ... 
arXiv:2111.11625v1 fatcat:qlf4e4q5yvglxannl3xqzgygaq

Learning Quality-aware Dynamic Memory for Video Object Segmentation [article]

Yong Liu, Ran Yu, Fei Yin, Xinyuan Zhao, Wei Zhao, Weihao Xia, Yujiu Yang
2022 arXiv   pre-print
Recently, several spatial-temporal memory-based methods have verified that storing intermediate frames and their masks as memory are helpful to segment target objects in videos.  ...  Then, we combine the segmentation quality with temporal consistency to dynamically update the memory bank to improve the practicability of the models.  ...  U1903213, the Shenzhen Key Laboratory of Marine IntelliSense and Computation (NO. ZDSYS20200811142605016.)  ... 
arXiv:2207.07922v1 fatcat:i6q5aahw5jhd7nq4m3vc6kk7la

Memory Aggregation Networks for Efficient Interactive Video Object Segmentation

Jiaxu Miao, Yunchao Wei, Yi Yang
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
Interactive video object segmentation (iVOS) aims at efficiently harvesting high-quality segmentation masks of the target object in a video with user interactions.  ...  Most previous state-of-the-arts tackle the iVOS with two independent networks for conducting user interaction and temporal propagation, respectively, leading to inefficiencies during the inference stage  ...  Acknowledgments This work is in part supported by ARC DP200100938 and ARC DECRA DE190101315.  ... 
doi:10.1109/cvpr42600.2020.01038 dblp:conf/cvpr/MiaoWY20 fatcat:rqwevo5gnndm5bdpbskfbxdevi

Contrastive Transformation for Self-supervised Correspondence Learning [article]

Ning Wang and Wengang Zhou and Houqiang Li
2020 arXiv   pre-print
Our simple framework outperforms the recent self-supervised correspondence methods on a range of visual tasks including video object tracking (VOT), video object segmentation (VOS), pose keypoint tracking  ...  By forcing the transformation consistency between intra- and inter-video levels, the fine-grained correspondence associations are well preserved and the instance-level feature discrimination is effectively  ...  The work of Wengang Zhou was supported in part by the National Natural Science Foundation of China under Contract 61822208, Contract U20A20183, and Contract 61632019; and in part by the Youth Innovation  ... 
arXiv:2012.05057v1 fatcat:wpcoxq7qovfyhf6xbe6lfzwibm

Learning to Associate Every Segment for Video Panoptic Segmentation [article]

Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon
2021 arXiv   pre-print
Temporal correspondence - linking pixels or objects across frames - is a fundamental supervisory signal for the video models.  ...  To validate our proposals, we adopt a deep siamese model and train the model to learn the temporal correspondence on two different levels (i.e., segment and pixel) along with the target task.  ...  It aims at a simultaneous prediction of object classes, masks, instance id associations, and semantic segmentation for all pixels in a video.  ... 
arXiv:2106.09453v1 fatcat:k4abtl6lfzftjp7rlenssush5i

Kernelized Memory Network for Video Object Segmentation [article]

Hongje Seong, Junhyuk Hyun, Euntai Kim
2020 arXiv   pre-print
Semi-supervised video object segmentation (VOS) is a task that involves predicting a target object in a video when the ground truth segmentation mask of the target object is given in the first frame.  ...  To solve the mismatch between STM and VOS, we propose a kernelized memory network (KMN). Before being trained on real videos, our KMN is pre-trained on static images, as in previous works.  ...  Related Work Semi-supervised video object segmentation [33, 34, 49] is a task involving prediction of the target objects in all frames of a video sequence where information of the target objects is provided  ... 
arXiv:2007.08270v1 fatcat:wok63aux3rd7bclhlvjozo6pqu
« Previous Showing results 1 — 15 out of 46,713 results