Filters








12,775 Hits in 7.5 sec

Corrupted Image Modeling for Self-Supervised Visual Pre-Training [article]

Yuxin Fang, Li Dong, Hangbo Bao, Xinggang Wang, Furu Wei
2022 arXiv   pre-print
We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training.  ...  CIM is a general and flexible visual pre-training framework that is suitable for various network architectures.  ...  Acknowledgement We would like to acknowledge Yaru Hao for the helpful discussions.  ... 
arXiv:2202.03382v1 fatcat:fyg3ynyd7jeupjkcwam2y6os2u

Masked Image Modeling with Denoising Contrast [article]

Kun Yi, Yixiao Ge, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, Xiaohu Qie
2022 arXiv   pre-print
Since the development of self-supervised visual representation learning from contrastive learning to masked image modeling, there is no significant difference in essence, that is, how to design proper  ...  We further strengthen the denoising mechanism with asymmetric designs, including image perturbations and model progress rates, to improve the network pre-training.  ...  Specifically, BEiT [1] introduces a new pretext task, namely, masked image modeling, for visual pre-training.  ... 
arXiv:2205.09616v1 fatcat:ckvzq6qydjhitj5pvzxb7diio4

Masked Frequency Modeling for Self-Supervised Visual Pre-Training [article]

Jiahao Xie, Wei Li, Xiaohang Zhan, Ziwei Liu, Yew Soon Ong, Chen Change Loy
2022 arXiv   pre-print
We present Masked Frequency Modeling (MFM), a unified frequency-domain-based approach for self-supervised pre-training of visual models.  ...  For the first time, MFM demonstrates that, for both ViT and CNN, a simple non-Siamese framework can learn meaningful representations even using none of the following: (i) extra data, (ii) extra model,  ...  in self-supervised pre-training of visual models.  ... 
arXiv:2206.07706v1 fatcat:3wgbyj5scrbahpiea5ptb5fb5y

BEiT: BERT Pre-Training of Image Transformers [article]

Hangbo Bao, Li Dong, Songhao Piao, Furu Wei
2022 arXiv   pre-print
We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation from Image Transformers.  ...  The pre-training objective is to recover the original visual tokens based on the corrupted image patches.  ...  The first one applies discriminative learning for pre-training, such as contrastive learning [CXH21] , and self distillation [CTM + 21].  ... 
arXiv:2106.08254v2 fatcat:fafbljxuvjasdkqmhtimx4vc5e

On visual self-supervision and its effect on model robustness [article]

Michal Kucer, Diane Oyen, Garrett Kenyon
2021 arXiv   pre-print
to l_2 and l_∞ adversarial perturbations and natural image corruptions.  ...  Although self-supervised pre-training yields benefits in improving adversarial training as compared to random weight initialization, we observe no benefit in model robustness or accuracy if self-supervision  ...  self-supervised pre-training is followed by AT.  ... 
arXiv:2112.04367v1 fatcat:lgoa4vaozfhaxgipkvbyo7famq

Pre-Training Transformers for Domain Adaptation [article]

Burhan Ul Tayyab, Nicholas Chua
2021 arXiv   pre-print
The Visual Domain Adaptation Challenge 2021 called for unsupervised domain adaptation methods that could improve the performance of models by transferring the knowledge obtained from source datasets to  ...  In this paper, we utilize BeiT [1] and demonstrate its capability of capturing key attributes from source datasets and apply it to target datasets in a semi-supervised manner.  ...  The whole system is pre-trained on ImageNet-1k, where the Masked Image Modelling module is able to reconstruct the corrupted patch via self-supervised self-attention and thus is able to recognize and separate  ... 
arXiv:2112.09965v1 fatcat:7uecoevbqnck7ggzwtyygm6ntm

SiT: Self-supervised vIsion Transformer [article]

Sara Atito and Muhammad Awais and Josef Kittler
2021 arXiv   pre-print
We propose Self-supervised vIsion Transformers (SiT) and discuss several self-supervised training mechanisms to obtain a pretext model.  ...  In this work we investigate the merits of self-supervised learning for pretraining image/vision transformers and then using them for downstream classification tasks.  ...  Fig. 4 : 4 Fig.4: Performance of finetuning the self-supervised pre-trained models on different datasets in terms of number of epochs in which the self-supervised model was pre-trained.  ... 
arXiv:2104.03602v2 fatcat:leyl2xvhsnbflnwohrgigxsogy

SB-SSL: Slice-Based Self-Supervised Transformers for Knee Abnormality Classification from MRI [article]

Sara Atito, Syed Muhammad Anwar, Muhammad Awais, Josef Kitler
2022 arXiv   pre-print
Self-supervised learning (SSL) can be a solution for handling the lack of availability of ground truth labels, but generally requires a large amount of training data during the pretraining stage.  ...  Herein, we propose a slice-based self-supervised deep learning framework (SB-SSL), a novel slice-based paradigm for classifying abnormality using knee MRI scans.  ...  Whereas for self supervised training, which could alleviate this burden, the model performance drops.  ... 
arXiv:2208.13923v1 fatcat:f6rgkk5ghnex5n5qsts75xd7tu

Is Self-Supervised Learning More Robust Than Supervised Learning? [article]

Yuanyi Zhong, Haoran Tang, Junkun Chen, Jian Peng, Yu-Xiong Wang
2022 arXiv   pre-print
Self-supervised contrastive learning is a powerful tool to learn visual representation without labels.  ...  On the other hand, under pre-training corruptions, we find contrastive learning vulnerable to patch shuffling and pixel intensity change, yet less sensitive to dataset-level distribution change.  ...  Abstract Self-supervised contrastive learning is a powerful tool to learn visual representation without labels.  ... 
arXiv:2206.05259v1 fatcat:b63riqhjnveh7i6v2ulrvdxbgu

CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations [article]

Fuli Luo, Pengcheng Yang, Shicheng Li, Xuancheng Ren, Xu Sun
2020 arXiv   pre-print
Pre-trained self-supervised models such as BERT have achieved striking success in learning sequence representations, especially for natural language processing.  ...  In this way, it not only alleviates the pretrain-finetune discrepancy induced by the noise of pre-training, but also aids the pre-trained model in better capturing global semantics of the input via more  ...  On this account, the self-supervised representation model is pre-trained in a manner that is more applicable for noise-free data distribution.  ... 
arXiv:2010.06351v4 fatcat:2qpuj4bprnfifkmzqccvrx5dpm

Masked Autoencoders that Listen [article]

Po-Yao Huang, Hu Xu, Juncheng Li, Alexei Baevski, Michael Auli, Wojciech Galuba, Florian Metze, Christoph Feichtenhofer
2022 arXiv   pre-print
Empirically, Audio-MAE sets new state-of-the-art performance on six audio and speech classification tasks, outperforming other recent models that use external supervised pre-training.  ...  This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms.  ...  We thank Kaiming He and Luke Zettlemoyer for their feedback and discussions.  ... 
arXiv:2207.06405v2 fatcat:fdl5ftp7qbbvtlfkayg3snon3u

Pre-Trained Image Processing Transformer [article]

Hanting Chen, Yunhe Wang, Tianyu Guo, Chang Xu, Yiping Deng, Zhenhua Liu, Siwei Ma, Chunjing Xu, Chao Xu, Wen Gao
2021 arXiv   pre-print
The IPT model is trained on these images with multi-heads and multi-tails. In addition, the contrastive learning is introduced for well adapting to different image processing tasks.  ...  To maximally excavate the capability of transformer, we present to utilize the well-known ImageNet benchmark for generating a large amount of corrupted image pairs.  ...  The IPT model is then trained using supervised and self-supervised approaches which shows strong ability for capturing intrinsic features for low-level image processing.  ... 
arXiv:2012.00364v4 fatcat:elfgar7pefcrzm5mouhsbgr2wq

GMML is All you Need [article]

Sara Atito and Muhammad Awais and Josef Kittler
2022 arXiv   pre-print
We propose group masked model learning (GMML), a self-supervised learning (SSL) mechanism for pretraining vision transformers with the ability to extract the contextual information present in all the concepts  ...  The key vehicle for the self-learning process used by the majority of self-learning methods is the generation of multiple views of the training data and the creation of pretext tasks which use these views  ...  For the optimisation of the self-supervised training, the model is trained using the Adam optimiser [38] with a momentum of 0.9.  ... 
arXiv:2205.14986v1 fatcat:rvf3d5evenh55jpvd6pbcv2ay4

Noise2Stack: Improving Image Restoration by Learning from Volumetric Data [article]

Mikhail Papkov, Kenny Roberts, Lee Ann Madissoon, Omer Bayraktar, Dmytro Fishman, Kaupo Palo, Leopold Parts
2020 arXiv   pre-print
Self-supervised methods, like Noise2Self and Noise2Void, relax data requirements by learning the signal without an explicit target but are limited by the lack of information in a single image.  ...  As a part of this work, we release a microscopy dataset to establish a benchmark for the multiplane image denoising.  ...  Using a single image stack for training in a self-supervised mode with four neighbours in the input, we outperform the original Noise2Noise model by 0.8 dB.  ... 
arXiv:2011.05105v1 fatcat:2m3qywfc2rdrpoqhgwae5qqfvu

Masked Unsupervised Self-training for Zero-shot Image Classification [article]

Junnan Li, Silvio Savarese, Steven C.H. Hoi
2022 arXiv   pre-print
On the other hand, models pre-trained with large-scale text-image supervision (e.g., CLIP) have enabled zero-shot transfer to downstream image classification tasks.  ...  We propose Masked Unsupervised Self-Training (MUST), a new approach which leverages two different and complimentary sources of supervision: pseudo-labels and raw images.  ...  We use 16 A100 GPUs, and the training process takes significantly less time compared to CLIP pre-training or self-supervised pre-training.  ... 
arXiv:2206.02967v1 fatcat:pmtonxgsu5f2dc7k2evgblxc6i
« Previous Showing results 1 — 15 out of 12,775 results