A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency
[article]
2021
arXiv
pre-print
We study self-supervised video representation learning, which is a challenging task due to 1) lack of labels for explicit supervision; 2) unstructured and noisy visual information. ...
The appearance consistency task aims to maximize the similarity between two clips of the same video with different playback speeds. ...
Self-supervised video representation learning aims to learn an encoder f (·; θ) to map the clip c i to consistent feature x i under different video augmentations. ...
arXiv:2106.02342v2
fatcat:y6tjnnupmfedjgy7cwanonc6ke
Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation
[article]
2022
arXiv
pre-print
self-supervised representations. ...
Despite the outstanding success of self-supervised pretraining methods for video representation learning, they generalise poorly when the unlabeled dataset for pretraining is small or the domain difference ...
ASCNet achieves the most superior results with a combined appearance and speed manipulation approach. ...
arXiv:2112.04011v3
fatcat:kqy57av54fe3bcnqn6msje2mou
Motion-aware Contrastive Video Representation Learning via Foreground-background Merging
[article]
2022
arXiv
pre-print
In light of the success of contrastive learning in the image domain, current self-supervised video representation learning methods usually employ contrastive loss to facilitate video representation learning ...
When naively pulling two augmented views of a video closer, the model however tends to learn the common static background as a shortcut but fails to capture the motion information, a phenomenon dubbed ...
Self-supervised Video Representation Learning. ...
arXiv:2109.15130v3
fatcat:qy2voj5fxfbl3dla7svjpp54l4
Controllable Augmentations for Video Representation Learning
[article]
2022
arXiv
pre-print
This paper focuses on self-supervised video representation learning. ...
We also introduce local-global temporal order dependency to further bridge the gap between clip-level and video-level representations for robust temporal modeling. ...
Method The core idea of our proposed framework is to enhance self-supervised video representation learning by comprehensive appearance and motion content modeling. ...
arXiv:2203.16632v2
fatcat:zeqwmlv7pbg25oqzx6eu43hiri
Dual Contrastive Learning for Spatio-temporal Representation
[article]
2022
pre-print
Contrastive learning has shown promising potential in self-supervised spatio-temporal representation learning. Most works naively sample different clips to construct positive and negative pairs. ...
We term our method as Dual Contrastive Learning for spatio-temporal Representation (DCLR). ...
RELATED WORK Self-supervised Representation Learning. ...
doi:10.1145/3503161.3547783
arXiv:2207.05340v1
fatcat:yka5o3jdgreo5dhucwflc6gfbq