5 Hits in 2.4 sec

ASCNet: Self-supervised Video Representation Learning with Appearance-Speed Consistency [article]

Deng Huang, Wenhao Wu, Weiwen Hu, Xu Liu, Dongliang He, Zhihua Wu, Xiangmiao Wu, Mingkui Tan, Errui Ding
2021 arXiv   pre-print
We study self-supervised video representation learning, which is a challenging task due to 1) lack of labels for explicit supervision; 2) unstructured and noisy visual information.  ...  The appearance consistency task aims to maximize the similarity between two clips of the same video with different playback speeds.  ...  Self-supervised video representation learning aims to learn an encoder f (·; θ) to map the clip c i to consistent feature x i under different video augmentations.  ... 
arXiv:2106.02342v2 fatcat:y6tjnnupmfedjgy7cwanonc6ke

Auxiliary Learning for Self-Supervised Video Representation via Similarity-based Knowledge Distillation [article]

Amirhossein Dadashzadeh, Alan Whone, Majid Mirmehdi
2022 arXiv   pre-print
self-supervised representations.  ...  Despite the outstanding success of self-supervised pretraining methods for video representation learning, they generalise poorly when the unlabeled dataset for pretraining is small or the domain difference  ...  ASCNet achieves the most superior results with a combined appearance and speed manipulation approach.  ... 
arXiv:2112.04011v3 fatcat:kqy57av54fe3bcnqn6msje2mou

Motion-aware Contrastive Video Representation Learning via Foreground-background Merging [article]

Shuangrui Ding, Maomao Li, Tianyu Yang, Rui Qian, Haohang Xu, Qingyi Chen, Jue Wang, Hongkai Xiong
2022 arXiv   pre-print
In light of the success of contrastive learning in the image domain, current self-supervised video representation learning methods usually employ contrastive loss to facilitate video representation learning  ...  When naively pulling two augmented views of a video closer, the model however tends to learn the common static background as a shortcut but fails to capture the motion information, a phenomenon dubbed  ...  Self-supervised Video Representation Learning.  ... 
arXiv:2109.15130v3 fatcat:qy2voj5fxfbl3dla7svjpp54l4

Controllable Augmentations for Video Representation Learning [article]

Rui Qian, Weiyao Lin, John See, Dian Li
2022 arXiv   pre-print
This paper focuses on self-supervised video representation learning.  ...  We also introduce local-global temporal order dependency to further bridge the gap between clip-level and video-level representations for robust temporal modeling.  ...  Method The core idea of our proposed framework is to enhance self-supervised video representation learning by comprehensive appearance and motion content modeling.  ... 
arXiv:2203.16632v2 fatcat:zeqwmlv7pbg25oqzx6eu43hiri

Dual Contrastive Learning for Spatio-temporal Representation [article]

Shuangrui Ding, Rui Qian, Hongkai Xiong
2022 pre-print
Contrastive learning has shown promising potential in self-supervised spatio-temporal representation learning. Most works naively sample different clips to construct positive and negative pairs.  ...  We term our method as Dual Contrastive Learning for spatio-temporal Representation (DCLR).  ...  RELATED WORK Self-supervised Representation Learning.  ... 
doi:10.1145/3503161.3547783 arXiv:2207.05340v1 fatcat:yka5o3jdgreo5dhucwflc6gfbq