4,309 Hits in 5.8 sec

Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting [article]

Martine Toering, Ioannis Gatopoulos, Maarten Stol, Vincent Tao Hu
2021 arXiv   pre-print
In this paper we propose "Video Cross-Stream Prototypical Contrasting", a novel method which predicts consistent prototype assignments from both RGB and optical flow views, operating on sets of samples  ...  Instance-level contrastive learning techniques, which rely on data augmentation and a contrastive loss function, have found great success in the domain of visual representation learning.  ...  Supplementary Material for Self-supervised Video Representation Learning with Cross-Stream Prototypical Contrasting A.  ... 
arXiv:2106.10137v3 fatcat:yj3ph4he25dzjnivxedtug3itq

Comparing Learning Methodologies for Self-Supervised Audio-Visual Representation Learning

Hacene Terbouche, Liam Schoneveld, Oisin Benson, Alice Othmani.
2022 IEEE Access  
In this paper, a new self-supervised approach is proposed for learning audio-visual representations from large databases of unlabeled videos.  ...  To implement these tasks, three methodologies are assessed: contrastive learning, prototypical constrasting and redundancy reduction.  ...  SELF-SUPERVISED VIDEO REPRESENTATION LEARNING Self-supervised learning has been also applied to videos due to the large availability of unlabeled videos on the web.  ... 
doi:10.1109/access.2022.3164745 fatcat:3g7xld2h2bgtpp2fru2n4yam6a

Unsupervised Voice-Face Representation Learning by Cross-Modal Prototype Contrast [article]

Boqing Zhu, Kele Xu, Changjian Wang, Zheng Qin, Tao Sun, Huaimin Wang, Yuxing Peng
2022 arXiv   pre-print
We present an approach to learn voice-face representations from the talking face videos, without any identity labels.  ...  To address these issues, we propose the cross-modal prototype contrastive learning (CMPC), which takes advantage of contrastive methods and resists adverse effects of false negatives and deviate positives  ...  In [Nagrani et al., 2018a] , they learn the cross-modal representation in a self-supervised manner by the contrastive loss, and a curriculum learning schedule boosts the performance.  ... 
arXiv:2204.14057v2 fatcat:4pwn7qreojgotisjcwsc4yn6ii

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization [article]

Rui Qian, Yuxi Li, Huabin Liu, John See, Shuangrui Ding, Xian Liu, Dian Li, Weiyao Lin
2021 arXiv   pre-print
The crux of self-supervised video representation learning is to build general features from unlabeled videos.  ...  Concretely, high-level features obtained from naive and prototypical contrastive learning are utilized to build distribution graphs, guiding the process of low-level and mid-level feature learning.  ...  Though contrastive self-supervised learning contributes to better representation, the temporal information in videos is not well leveraged.  ... 
arXiv:2108.02183v2 fatcat:ipbgwj6w6rfgxmhdxfgb5tzgdm

Beyond category-supervision: instance-level contrastive learning models predict human visual system responses to objects [article]

Talia Konkle, George A Alvarez
2021 bioRxiv   pre-print
The strong correspondence between category-supervised deep neural networks and ventral stream representation supports this view, but does not provide a viable learning model, as these deepnets rely upon  ...  Here we present a fully self-supervised model which instead learns to represent individual images, where views of the same image are embedded nearby in a low-dimensional feature space, distinctly from  ...  RESULTS Instance-prototype contrastive learning We designed an instance-prototype contrastive-learning algorithm (IPCL) to learn a representation of visual object information in a fully self-supervised  ... 
doi:10.1101/2021.05.28.446118 fatcat:mt47l7eq7najjjijp4klz6evfy

MoPro: Webly Supervised Learning with Momentum Prototypes [article]

Junnan Li, Caiming Xiong, Steven C.H. Hoi
2020 arXiv   pre-print
We propose momentum prototypes (MoPro), a simple contrastive learning method that achieves online label noise correction, out-of-distribution sample removal, and representation learning.  ...  We propose a webly-supervised representation learning method that does not suffer from the annotation unscalability of supervised learning, nor the computation unscalability of self-supervised learning  ...  The recent developments in self-supervised representation learning can be attributed to contrastive learning.  ... 
arXiv:2009.07995v1 fatcat:wj4zhjk6ybbv7osudgwjr7aaca

Unsupervised Few-Shot Action Recognition via Action-Appearance Aligned Meta-Adaptation [article]

Jay Patravali, Gaurav Mittal, Ye Yu, Fuxin Li, Mei Chen
2021 arXiv   pre-print
MetaUVFS leverages over 550K unlabeled videos to train a two-stream 2D and 3D CNN architecture via contrastive learning to capture the appearance-specific spatial and action-specific spatio-temporal video  ...  We present MetaUVFS as the first Unsupervised Meta-learning algorithm for Video Few-Shot action recognition.  ...  MetaUVFS uses a two-stream network to learn action and appearance-specific features via contrastive learning over 550K unlabeled videos.  ... 
arXiv:2109.15317v2 fatcat:36fm5sd6azhzpdcfr2hoh3lyjm

Self-supervised learning methods and applications in medical imaging analysis: A survey [article]

Saeed Shurrab, Rehab Duwairi
2021 arXiv   pre-print
Self-supervised learning is a recent training paradigm that enables learning robust representations without the need for human annotation which can be considered as an effective solution for the scarcity  ...  This article reviews the state-of-the-art research directions in self-supervised learning approaches for image data with concentration on their applications in the field of medical imaging analysis.  ...  Contrastive self-supervised learning Contrastive predictive coding Contrastive predictive coding (CPC), is a contrastive unsupervised representation learning proposed by Oord et al. [2018] that can fit  ... 
arXiv:2109.08685v2 fatcat:iu2zanqqrnaflawcxndb6xszgu

Noise-Tolerant Learning for Audio-Visual Action Recognition [article]

Haochen Han, Qinghua Zheng, Minnan Luo, Kaiyao Miao, Feng Tian, Yan Chen
2022 arXiv   pre-print
Recently, video recognition is emerging with the help of multi-modal learning, which focuses on integrating multiple modalities to improve the performance or robustness of a model.  ...  A noise-tolerant contrastive training phase is performed first to learn robust model parameters unaffected by the noisy labels.  ...  [3] propose a cross-modal self-supervision method to localize the audio object within an image. Humam et al.  ... 
arXiv:2205.07611v2 fatcat:lzf66dhinrfxjgo3lrw4c7lb2a

SOS! Self-supervised Learning Over Sets Of Handled Objects In Egocentric Action Recognition [article]

Victor Escorcia, Ricardo Guerrero, Xiatian Zhu, Brais Martinez
2022 arXiv   pre-print
To overcome both limitations, we introduce Self-Supervised Learning Over Sets (SOS), an approach to pre-train a generic Objects In Contact (OIC) representation model from video object regions detected  ...  Instead of augmenting object regions individually as in conventional self-supervised learning, we view the action process as a means of natural data transformations with unique spatio-temporal continuity  ...  Self-Supervised Object Representation Learning from Video Object Regions Object representations are typically learned as part of the standard object detector training pipeline.  ... 
arXiv:2204.04796v2 fatcat:2bc4lfweg5b65nkmhpnkgwfsqu

Contrastive Representation Learning: A Framework and Review [article]

Phuc H. Le-Khac, Graham Healy, Alan F. Smeaton
2020 arXiv   pre-print
Contrastive Learning has recently received interest due to its success in self-supervised representation learning in the computer vision domain.  ...  In this paper we provide a comprehensive literature review and we propose a general Contrastive Representation Learning framework that simplifies and unifies many different contrastive learning methods  ...  Therefore self-supervised contrastive representation learning methods usually require large batch sizes and longer training times than other supervised or self-supervised methods.  ... 
arXiv:2010.05113v1 fatcat:xdegcaoarvevdfl4r22pyzqr4e

Contrastive Representation Learning: A Framework and Review

Phuc H. Le-Khac, Graham Healy, Alan F. Smeaton
2020 IEEE Access  
Science Foundation Ireland through the SFI Centre for Research Training in Machine Learning (18/CRT/6183) and the Insight Centre for Data Analytics (SFI/12/RC/2289_P2).  ...  Therefore self-supervised contrastive representation learning methods usually require large batch sizes and longer training times than other supervised or self-supervised methods.  ...  [91] introduced the Time-Contrastive Network (TCN), a self-supervised method to learn a view-agnostic but time-sensitive representation from unlabelled videos.  ... 
doi:10.1109/access.2020.3031549 fatcat:qohhn2f2tray5ha3iafxbnwp74

Few-shot Action Recognition with Permutation-invariant Attention [article]

Hongguang Zhang, Li Zhang, Xiaojuan Qi, Hongdong Li, Philip H. S. Torr, Piotr Koniusz
2020 arXiv   pre-print
Many few-shot learning models focus on recognising images. In contrast, we tackle a challenging task of few-shot action recognition from videos.  ...  Importantly, to re-weight block contributions during pooling, we exploit spatial and temporal attention modules and self-supervision.  ...  Self-supervised learning leverages free supervision signals residing in images and videos to promote robust representation learning in image recognition [5, 4, 16] , video recognition [9, 40, 12] , video  ... 
arXiv:2001.03905v3 fatcat:bfj2xhgavvgete5m347gja6ney

Learning Spatiotemporal Features via Video and Text Pair Discrimination [article]

Tianhao Li, Limin Wang
2021 arXiv   pre-print
existing state-of-the-art self-supervised training methods.  ...  Current video representations heavily rely on learning from manually annotated video datasets which are time-consuming and expensive to acquire.  ...  RELATED WORK Self/Weakly Supervised Representation Learning. Self supervised representation was popular in both image and video domains by designing various proxy tasks.  ... 
arXiv:2001.05691v3 fatcat:hrybsmcveveh7flai3faesa5qa

Transformers in Vision: A Survey [article]

Salman Khan, Muzammal Naseer, Munawar Hayat, Syed Waqas Zamir, Fahad Shahbaz Khan, Mubarak Shah
2021 arXiv   pre-print
We start with an introduction to fundamental concepts behind the success of Transformers i.e., self-attention, large-scale pre-training, and bidirectional encoding.  ...  ., visual-question answering, visual reasoning, and visual grounding), video processing (e.g., activity recognition, video forecasting), low-level vision (e.g., image super-resolution, image enhancement  ...  We would also like to thank Mohamed Afham for his help with a figure.  ... 
arXiv:2101.01169v4 fatcat:ynsnfuuaize37jlvhsdki54cy4
« Previous Showing results 1 — 15 out of 4,309 results