Filters








25,071 Hits in 7.9 sec

Joint-task Self-supervised Learning for Temporal Correspondence [article]

Xueting Li, Sifei Liu, Shalini De Mello, Xiaolong Wang, Jan Kautz, Ming-Hsuan Yang
2019 arXiv   pre-print
This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner.  ...  Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object  ...  Self-supervised learning.  ... 
arXiv:1909.11895v1 fatcat:g6nxl6dvcfcz5l6hmvlxdxbqqa

Self-supervised Learning of Pose Embeddings from Spatiotemporal Relations in Videos [article]

Ömer Sümer, Tobias Dencker, Björn Ommer
2017 arXiv   pre-print
To avoid the need for expensive labeling, we exploit spatiotemporal relations in training videos for self-supervised learning of pose embeddings.  ...  The key idea is to combine temporal ordering and spatial placement estimation as auxiliary tasks for learning pose similarities in a Siamese convolutional network.  ...  Acknowledgments: This work has been supported in part by the Heidelberg Academy for the Sciences, DFG, and by an NVIDIA hardware grant.  ... 
arXiv:1708.02179v1 fatcat:x3t3ig2c5zbablp4slyahe4efq

Context-Aware Sequence Alignment using 4D Skeletal Augmentation [article]

Taein Kwon, Bugra Tekin, Siyu Tang, Marc Pollefeys
2022 arXiv   pre-print
Moreover, we introduce a self-supervised learning scheme that is empowered by novel 4D augmentation techniques for 3D skeleton representations.  ...  In this work, based on off-the-shelf human pose estimators, we propose a novel context-aware self-supervised learning architecture to align sequences of actions. We name it CASA.  ...  The authors thank Jonas Hein, Mihai Dusmanu, Paul-Edouard Sarlin, Luca Cavalli, Yao Feng, and Weizhe Liu for helpful discussions.  ... 
arXiv:2204.12223v1 fatcat:i7nwb5g2a5aovkwe7g6xmmxkwi

Time-Contrastive Networks: Self-Supervised Learning from Video [article]

Pierre Sermanet, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine
2018 arXiv   pre-print
We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used  ...  While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human.  ...  Interestingly, we find that for all joints excepts for "shoulder pan", the unsupervised "TC+Self" models performs almost as well as the human-supervised "TC+Human+Self".  ... 
arXiv:1704.06888v3 fatcat:mqt2bdjvobc7lidrtvrc3rtnoi

Skeleton-Contrastive 3D Action Representation Learning [article]

Fida Mohammad Thoker, Hazel Doughty, Cees G.M. Snoek
2021 arXiv   pre-print
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition.  ...  Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets with multiple downstream tasks, including action recognition,  ...  Unless mentioned otherwise we use | | = 15 for the joint jitter augmentation and = 0.1 for the temporal crop-resize augmentation. Self-Supervised Pretraining.  ... 
arXiv:2108.03656v1 fatcat:fb5uagfxvbexxd5m744k6u5ftq

Self-supervised Representation Learning for Ultrasound Video [article]

Jianbo Jiao, Richard Droste, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble
2020 arXiv   pre-print
In this paper, we propose a self-supervised learning approach to learn meaningful and transferable representations from medical imaging video without any type of human annotation.  ...  Therefore we force the model to address anatomy-aware tasks with free supervision from the data itself.  ...  Here we explore this question through self-supervised representation learning, in which "self-supervised" indicates that the learning process is supervised purely based on the data itself (also termed  ... 
arXiv:2003.00105v1 fatcat:gybfpunhine5bdihb72cywu7ti

MS2L

Lilang Lin, Sijie Song, Wenhan Yang, Jiaying Liu
2020 Proceedings of the 28th ACM International Conference on Multimedia  
In this paper, we address self-supervised representation learning from human skeletons for action recognition.  ...  Besides, we explore different training strategies to utilize the knowledge from self-supervised tasks for action recognition.  ...  Multiple Self-Supervised Tasks We now describe our self-supervised learning techniques.  ... 
doi:10.1145/3394171.3413548 dblp:conf/mm/LinSY020 fatcat:vgmk7qtc7vfrbgktb2ae44sck4

Video-Text Representation Learning via Differentiable Weak Temporal Alignment [article]

Dohwan Ko, Joonmyung Choi, Juyeon Ko, Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim
2022 arXiv   pre-print
Learning generic joint representations for video and text by a supervised method requires a prohibitively substantial amount of manually annotated video datasets.  ...  But it is still challenging to learn joint embeddings of video and text in a self-supervised manner, due to its ambiguity and non-sequential alignment.  ...  Related Work Self-Supervised Learning for Videos.  ... 
arXiv:2203.16784v1 fatcat:6yep4tkipff6jh4kcij4f7peaa

Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning [article]

Li Tao, Xueting Wang, Toshihiko Yamasaki
2021 arXiv   pre-print
Recently, pretext-task based methods are proposed one after another in self-supervised video feature learning. Meanwhile, contrastive learning methods also yield good performance.  ...  It is convenient to treat PCL as a standard training strategy and apply it to many other works in self-supervised video feature learning.  ...  ACKNOWLEDGMENTS This work was partially financially supported by the Grantsin-Aid for Scientific Research Numbers JP19K20289 and JP18H03339 from JSPS.  ... 
arXiv:2010.15464v2 fatcat:wd3uc3vfcnehviiuonxoj3wfnq

DTG-Net: Differentiated Teachers Guided Self-Supervised Video Action Recognition [article]

Ziming Liu and Guangyu Gao and A. K. Qin and Jinyang Li
2020 arXiv   pre-print
Meanwhile, the DTG-Net is optimized in the way of contrastive self-supervised learning.  ...  Specifically, leveraging the years of effort in action-related tasks, e.g., image classification, image-based action recognition, the DTG-Net learns the self-supervised video representation under various  ...  Temporal base Self-supervised Learning For the video data, most self-supervised learning tasks are related to temporal constrains.  ... 
arXiv:2006.07609v1 fatcat:75ugmwcudbdkjiyytxf65vvu3a

Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework [article]

Li Tao, Xueting Wang, Toshihiko Yamasaki
2020 arXiv   pre-print
We propose a self-supervised method to learn feature representations from videos.  ...  A standard approach in traditional self-supervised methods uses positive-negative data pairs to train with contrastive learning strategy.  ...  Many self-supervised learning techniques are proposed for image data.  ... 
arXiv:2008.02531v1 fatcat:jq6f3jf2ubahfkafrcraeonl7q

Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation [article]

Yujia Zhang, Lai-Man Po, Xuyuan Xu, Mengyang Liu, Yexin Wang, Weifeng Ou, Yuzhi Zhao, Wing-Yin Yu
2021 arXiv   pre-print
Spatio-temporal representation learning is critical for video self-supervised representation. Recent approaches mainly use contrastive learning and pretext tasks.  ...  Moreover, we employ a joint optimization combining pretext tasks with contrastive learning to further enhance the spatio-temporal representation learning.  ...  pretext (CSTP) approach for video self-supervised learning.  ... 
arXiv:2112.08913v2 fatcat:c45xi6s74baajhi7tap2gckb7e

Self-supervised Video Transformer [article]

Kanchana Ranasinghe, Muzammal Naseer, Salman Khan, Fahad Shahbaz Khan, Michael Ryoo
2022 arXiv   pre-print
In this paper, we propose self-supervised training for video transformers using unlabeled video data.  ...  Our self-supervised objective seeks to match the features of these different views representing the same video, to be invariant to spatiotemporal variations in actions.  ...  Recent self-supervised methods perform on-par with supervised learning for certain vision tasks [11, 12, 16, 37] .  ... 
arXiv:2112.01514v2 fatcat:piqaitfjgbbq5ewvxw3zxwdpsm

Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics

Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu, Wei Liu
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation.  ...  While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic  ...  Dynamic Scene Recognition The performance on UCF101, HMDB51 and ASLAN dataset shows that our proposed self-supervised learning task can drive the C3D to learn powerful spatio-temporal features for action  ... 
doi:10.1109/cvpr.2019.00413 dblp:conf/cvpr/WangJBHLL19 fatcat:r6wofchpfzas5pcmqv6no6auye

Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics [article]

Jiangliu Wang, Jianbo Jiao, Linchao Bao, Shengfeng He, Yunhui Liu and Wei Liu
2019 arXiv   pre-print
In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation.  ...  While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic  ...  Dynamic Scene Recognition The performance on UCF101, HMDB51 and ASLAN dataset shows that our proposed self-supervised learning task can drive the C3D to learn powerful spatio-temporal features for action  ... 
arXiv:1904.03597v1 fatcat:lsef6r7rvvbffasapxjglyiaui
« Previous Showing results 1 — 15 out of 25,071 results