A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Joint-task Self-supervised Learning for Temporal Correspondence
[article]
2019
arXiv
pre-print
This paper proposes to learn reliable dense correspondence from videos in a self-supervised manner. ...
Our method outperforms the state-of-the-art self-supervised methods on a variety of visual correspondence tasks, including video-object and part-segmentation propagation, keypoint tracking, and object ...
Self-supervised learning. ...
arXiv:1909.11895v1
fatcat:g6nxl6dvcfcz5l6hmvlxdxbqqa
Self-supervised Learning of Pose Embeddings from Spatiotemporal Relations in Videos
[article]
2017
arXiv
pre-print
To avoid the need for expensive labeling, we exploit spatiotemporal relations in training videos for self-supervised learning of pose embeddings. ...
The key idea is to combine temporal ordering and spatial placement estimation as auxiliary tasks for learning pose similarities in a Siamese convolutional network. ...
Acknowledgments: This work has been supported in part by the Heidelberg Academy for the Sciences, DFG, and by an NVIDIA hardware grant. ...
arXiv:1708.02179v1
fatcat:x3t3ig2c5zbablp4slyahe4efq
Context-Aware Sequence Alignment using 4D Skeletal Augmentation
[article]
2022
arXiv
pre-print
Moreover, we introduce a self-supervised learning scheme that is empowered by novel 4D augmentation techniques for 3D skeleton representations. ...
In this work, based on off-the-shelf human pose estimators, we propose a novel context-aware self-supervised learning architecture to align sequences of actions. We name it CASA. ...
The authors thank Jonas Hein, Mihai Dusmanu, Paul-Edouard Sarlin, Luca Cavalli, Yao Feng, and Weizhe Liu for helpful discussions. ...
arXiv:2204.12223v1
fatcat:i7nwb5g2a5aovkwe7g6xmmxkwi
Time-Contrastive Networks: Self-Supervised Learning from Video
[article]
2018
arXiv
pre-print
We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used ...
While representations are learned from an unlabeled collection of task-related videos, robot behaviors such as pouring are learned by watching a single 3rd-person demonstration by a human. ...
Interestingly, we find that for all joints excepts for "shoulder pan", the unsupervised "TC+Self" models performs almost as well as the human-supervised "TC+Human+Self". ...
arXiv:1704.06888v3
fatcat:mqt2bdjvobc7lidrtvrc3rtnoi
Skeleton-Contrastive 3D Action Representation Learning
[article]
2021
arXiv
pre-print
This paper strives for self-supervised learning of a feature space suitable for skeleton-based action recognition. ...
Our approach achieves state-of-the-art performance for self-supervised learning from skeleton data on the challenging PKU and NTU datasets with multiple downstream tasks, including action recognition, ...
Unless mentioned otherwise we use | | = 15 for the joint jitter augmentation and = 0.1 for the temporal crop-resize augmentation. Self-Supervised Pretraining. ...
arXiv:2108.03656v1
fatcat:fb5uagfxvbexxd5m744k6u5ftq
Self-supervised Representation Learning for Ultrasound Video
[article]
2020
arXiv
pre-print
In this paper, we propose a self-supervised learning approach to learn meaningful and transferable representations from medical imaging video without any type of human annotation. ...
Therefore we force the model to address anatomy-aware tasks with free supervision from the data itself. ...
Here we explore this question through self-supervised representation learning, in which "self-supervised" indicates that the learning process is supervised purely based on the data itself (also termed ...
arXiv:2003.00105v1
fatcat:gybfpunhine5bdihb72cywu7ti
In this paper, we address self-supervised representation learning from human skeletons for action recognition. ...
Besides, we explore different training strategies to utilize the knowledge from self-supervised tasks for action recognition. ...
Multiple Self-Supervised Tasks We now describe our self-supervised learning techniques. ...
doi:10.1145/3394171.3413548
dblp:conf/mm/LinSY020
fatcat:vgmk7qtc7vfrbgktb2ae44sck4
Video-Text Representation Learning via Differentiable Weak Temporal Alignment
[article]
2022
arXiv
pre-print
Learning generic joint representations for video and text by a supervised method requires a prohibitively substantial amount of manually annotated video datasets. ...
But it is still challenging to learn joint embeddings of video and text in a self-supervised manner, due to its ambiguity and non-sequential alignment. ...
Related Work Self-Supervised Learning for Videos. ...
arXiv:2203.16784v1
fatcat:6yep4tkipff6jh4kcij4f7peaa
Pretext-Contrastive Learning: Toward Good Practices in Self-supervised Video Representation Leaning
[article]
2021
arXiv
pre-print
Recently, pretext-task based methods are proposed one after another in self-supervised video feature learning. Meanwhile, contrastive learning methods also yield good performance. ...
It is convenient to treat PCL as a standard training strategy and apply it to many other works in self-supervised video feature learning. ...
ACKNOWLEDGMENTS This work was partially financially supported by the Grantsin-Aid for Scientific Research Numbers JP19K20289 and JP18H03339 from JSPS. ...
arXiv:2010.15464v2
fatcat:wd3uc3vfcnehviiuonxoj3wfnq
DTG-Net: Differentiated Teachers Guided Self-Supervised Video Action Recognition
[article]
2020
arXiv
pre-print
Meanwhile, the DTG-Net is optimized in the way of contrastive self-supervised learning. ...
Specifically, leveraging the years of effort in action-related tasks, e.g., image classification, image-based action recognition, the DTG-Net learns the self-supervised video representation under various ...
Temporal base Self-supervised Learning For the video data, most self-supervised learning tasks are related to temporal constrains. ...
arXiv:2006.07609v1
fatcat:75ugmwcudbdkjiyytxf65vvu3a
Self-supervised Video Representation Learning Using Inter-intra Contrastive Framework
[article]
2020
arXiv
pre-print
We propose a self-supervised method to learn feature representations from videos. ...
A standard approach in traditional self-supervised methods uses positive-negative data pairs to train with contrastive learning strategy. ...
Many self-supervised learning techniques are proposed for image data. ...
arXiv:2008.02531v1
fatcat:jq6f3jf2ubahfkafrcraeonl7q
Contrastive Spatio-Temporal Pretext Learning for Self-supervised Video Representation
[article]
2021
arXiv
pre-print
Spatio-temporal representation learning is critical for video self-supervised representation. Recent approaches mainly use contrastive learning and pretext tasks. ...
Moreover, we employ a joint optimization combining pretext tasks with contrastive learning to further enhance the spatio-temporal representation learning. ...
pretext (CSTP) approach for video self-supervised learning. ...
arXiv:2112.08913v2
fatcat:c45xi6s74baajhi7tap2gckb7e
Self-supervised Video Transformer
[article]
2022
arXiv
pre-print
In this paper, we propose self-supervised training for video transformers using unlabeled video data. ...
Our self-supervised objective seeks to match the features of these different views representing the same video, to be invariant to spatiotemporal variations in actions. ...
Recent self-supervised methods perform on-par with supervised learning for certain vision tasks [11, 12, 16, 37] . ...
arXiv:2112.01514v2
fatcat:piqaitfjgbbq5ewvxw3zxwdpsm
Self-Supervised Spatio-Temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
2019
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. ...
While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic ...
Dynamic Scene Recognition The performance on UCF101, HMDB51 and ASLAN dataset shows that our proposed self-supervised learning task can drive the C3D to learn powerful spatio-temporal features for action ...
doi:10.1109/cvpr.2019.00413
dblp:conf/cvpr/WangJBHLL19
fatcat:r6wofchpfzas5pcmqv6no6auye
Self-supervised Spatio-temporal Representation Learning for Videos by Predicting Motion and Appearance Statistics
[article]
2019
arXiv
pre-print
In this paper we propose a novel self-supervised approach to learn spatio-temporal features for video representation. ...
While previous efforts address the problem by designing novel self-supervised tasks using video data, the learned features are merely on a frame-by-frame basis, which are not applicable to many video analytic ...
Dynamic Scene Recognition The performance on UCF101, HMDB51 and ASLAN dataset shows that our proposed self-supervised learning task can drive the C3D to learn powerful spatio-temporal features for action ...
arXiv:1904.03597v1
fatcat:lsef6r7rvvbffasapxjglyiaui
« Previous
Showing results 1 — 15 out of 25,071 results