Filters








175,778 Hits in 3.0 sec

Big Transfer (BiT): General Visual Representation Learning [article]

Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby
2020 arXiv   pre-print
Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision.  ...  We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT).  ...  In particular, we thank Andrei Giurgiu for finding a bug in our data input pipeline, Marcin Michalski for the naming idea and general helpful advice, and Damien Vincent and Daniel Keysers for detailed  ... 
arXiv:1912.11370v3 fatcat:oesjnhydwvbtfarsssmn5a2p7m

SoundNet: Learning Sound Representations from Unlabeled Video [article]

Yusuf Aytar, Carl Vondrick, Antonio Torralba
2016 arXiv   pre-print
We learn rich natural sound representations by capitalizing on large amounts of unlabeled sound data collected in the wild.  ...  Our sound representation yields significant performance improvements over the state-of-the-art results on standard benchmarks for acoustic scene/object classification.  ...  Acknowledgements: We thank MIT TIG, especially Garrett Wollman, for helping store 26 TB of video. We are grateful for the GPUs donated by NVidia.  ... 
arXiv:1610.09001v1 fatcat:t36x2kcktfcppass5ib4omty6i

Multi-Task Self-Training for Learning General Representations [article]

Golnaz Ghiasi, Barret Zoph, Ekin D. Cubuk, Quoc V. Le, Tsung-Yi Lin
2021 arXiv   pre-print
The results suggest self-training is a promising direction to aggregate labeled and unlabeled training data for learning general feature representations.  ...  Despite the fast progress in training specialized models for various tasks, learning a single general model that works well for many tasks is still challenging for computer vision.  ...  Acknowledgements We would like to thank Yin Cui, Aravind Srinivas, Simon Kornblith, and Ting Chen for valuable feedback.  ... 
arXiv:2108.11353v1 fatcat:i2bt3bxbxzfsjgvn7k53x3mlry

Damage detection using in-domain and cross-domain transfer learning [article]

Zaharah A. Bukhsh, Nils Jansen, Aaqib Saeed
2021 arXiv   pre-print
Typical image datasets for such problems are relatively small, calling for the transfer of learned representation from a related large-scale dataset.  ...  However, there are rising concerns about the generalizability of ImageNet representations for specific target domains, such as for visual inspection and medical imaging.  ...  The large-scale unlabeled datasets from visual inspections call for state-of-the-art machine learning methods. In particular, we focus on damage detection of concrete surfaces using visual data.  ... 
arXiv:2102.03858v1 fatcat:y5u3u2ylivb53fzd5ms35riczq

Do sound event representations generalize to other audio tasks? A case study in audio transfer learning [article]

Anurag Kumar, Yun Wang, Vamsi Krishna Ithapu, Christian Fuegen
2021 arXiv   pre-print
A simple, yet effective transfer learning approach utilizes deep neural networks trained on a large-scale task for feature extraction.  ...  In this paper, we investigate transfer learning capacity of audio representations obtained from neural networks trained on a large-scale sound event detection dataset.  ...  We train neural networks for a large scale SED task and transfer the representations obtained from these networks for any given audio to the target tasks.  ... 
arXiv:2106.11335v1 fatcat:el5xqdst5nerbkatid2vk7j4um

Tencent ML-Images: A Large-Scale Multi-Label Image Database for Visual Representation Learning [article]

Baoyuan Wu, Weidong Chen, Yanbo Fan, Yong Zhang, Jinlong Hou, Jie Liu, Tong Zhang
2019 arXiv   pre-print
The good quality of the visual representation of the Tencent ML-Images checkpoint is verified through three transfer learning tasks, including single-label image classification on ImageNet and Caltech-  ...  We efficiently train the ResNet-101 model with multi-label outputs on Tencent ML-Images, taking 90 hours for 60 epochs, based on a large-scale distributed deep learning framework,i.e.,TFplus.  ...  The above discussions explain why we need large-scale multi-label image database for visual representation learning with deep neural networks.  ... 
arXiv:1901.01703v6 fatcat:eysgbytnkfd5jh3loj777yfuw4

Large-Scale Social Multimedia Analysis [chapter]

Benjamin Bischke, Damian Borth, Andreas Dengel
2019 Big Data Analytics for Large-Scale Multimedia Search  
In 2012, Krizhevsky [11] applied deep convolutional networks on the ImageNet dataset 1 , and their AlexNet achieved breakthrough accuracy in the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC  ...  This transfer representation learning approach is critical 1 ImageNet is a dataset of over 15 million labeled images belonging to roughly 22,000 categories [12]. 2 The ILSVRC [13] evaluates algorithms  ...  For scale of computation, consult pioneering work at Google [74] , and subsequent efforts on accelerating deep learning such as DistBelief [75] , Adam [76] , and SpeeDO [46] .  ... 
doi:10.1002/9781119376996.ch6 fatcat:dw4rzuqeanbvxmaabtsgrid2ty

Detector discovery in the wild: Joint multiple instance and representation learning

Judy Hoffman, Deepak Pathak, Trevor Darrell, Kate Saenko
2015 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)  
We introduce a new model for large-scale learning of detectors that can jointly exploit weak and strong labels, perform inference over latent regions in weakly labeled training examples, and can transfer  ...  It is well known that contemporary visual models thrive on large amounts of training data, especially those that directly include labels for desired tasks.  ...  We introduce a new model for large-scale learning of detectors that can jointly exploit weak and strong labels, perform inference over latent regions in weakly labeled training examples, and can transfer  ... 
doi:10.1109/cvpr.2015.7298906 dblp:conf/cvpr/HoffmanPDS15 fatcat:ahrdlnnvjbcsleghxidjcpnuha

Unsupervised Finetuning [article]

Suichan Li and Dongdong Chen and Yinpeng Chen and Lu Yuan and Lei Zhang and Qi Chu and Bin Liu and Nenghai Yu
2021 arXiv   pre-print
This problem is more challenging than the supervised counterpart, as the low data density in the small-scale target data is not friendly for unsupervised learning, leading to the damage of the pretrained  ...  the motivation of the latter strategy is to increase the data density and help learn more compact representation.  ...  Introduction In recent years, visual recognition has achieved tremendous success [30, 19, 31] from the development of deep neural networks and large-scale labeled data.  ... 
arXiv:2110.09510v1 fatcat:vx6pustwkzb3doheu6aqb3qpsi

Smile, Be Happy :) Emoji Embedding for Visual Sentiment Analysis [article]

Ziad Al-Halah, Andrew Aitken, Wenzhe Shi, Jose Caballero
2020 arXiv   pre-print
Due to the lack of large-scale datasets, the prevailing approach in visual sentiment analysis is to leverage models trained for object classification in large datasets like ImageNet.  ...  However, objects are sentiment neutral which hinders the expected gain of transfer learning for such tasks.  ...  To that end, we introduce a large-scale benchmark for visual emoji prediction (Sec. 3.1) along with deep neural model for efficient emoji embedding and transfer learning (Sec. 3.2).  ... 
arXiv:1907.06160v3 fatcat:sxfq4z5ggvb5jf3x4frid7wwfu

RegionCLIP: Region-based Language-Image Pretraining [article]

Yiwu Zhong, Jianwei Yang, Pengchuan Zhang, Chunyuan Li, Noel Codella, Liunian Harold Li, Luowei Zhou, Xiyang Dai, Lu Yuan, Yin Li, Jianfeng Gao
2021 arXiv   pre-print
Moreoever, the learned region representations support zero-shot inference for object detection, showing promising results on both COCO and LVIS datasets.  ...  To mitigate this issue, we propose a new method called RegionCLIP that significantly extends CLIP to learn region-level visual representations, thus enabling fine-grained alignment between image regions  ...  COCO Focal Scaling Generalized (17+48) Novel Base All 22.6 58.5 49.1 31.4 57.1 50.4 Table 10 . 10 Ablation study on effects of focal scaling during transfer learning for object detection.  ... 
arXiv:2112.09106v1 fatcat:3pypzvqrhnhodkn5iz26qbguj4

Laplacian Denoising Autoencoder [article]

Jianbo Jiao, Linchao Bao, Yunchao Wei, Shengfeng He, Honghui Shi, Rynson Lau, Thomas S. Huang
2020 arXiv   pre-print
While deep neural networks have been shown to perform remarkably well in many machine learning tasks, labeling a large amount of ground truth data for supervised training is usually very costly to scale  ...  This can be naturally generalized to span multiple scales with a Laplacian pyramid representation of the input data.  ...  on large-scale data (e.g., ImageNet [17] ) and transferring the learned representations to a variety of downstream vision tasks including multi-label classification, object detection, and semantic segmentation  ... 
arXiv:2003.13623v1 fatcat:uel6btlupzbarey7rvr6i6ya7i

Transferring Visual Prior for Online Object Tracking

Qing Wang, Feng Chen, Jimei Yang, Wenli Xu, Ming-Hsuan Yang
2012 IEEE Transactions on Image Processing  
Visual prior from generic real-world images can be learned and transferred for representing objects in a scene.  ...  Motivated by this, we propose an algorithm that transfers visual prior learned offline for online object tracking.  ...  Wang was a visiting student at the University of California at Merced.  ... 
doi:10.1109/tip.2012.2190085 pmid:22491081 fatcat:obwthxkxqvag5eo6rwtqsaqpp4

Multimodal Transfer: A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer [article]

Xin Wang, Geoffrey Oxholm, Da Zhang, Yuan-Fang Wang
2017 arXiv   pre-print
That is, our scheme can generate results that are visually pleasing and more similar to multiple desired artistic styles with color and texture cues at multiple scales.  ...  By properly handling style and texture cues at multiple scales using several modalities, we can transfer not just large-scale, obvious style cues but also subtle, exquisite ones.  ...  [8] , where a pre-trained deep learning network for visual recognition is used to capture both style and content representations, and achieves visually stunning results.  ... 
arXiv:1612.01895v2 fatcat:eq7pgegpdjddpcmafchye2otce

Representation Learning on Large and Small Data [article]

Chun-Nan Chou, Chuen-Kai Shie, Fu-Chieh Chang, Jocelyn Chang, Edward Y. Chang
2017 arXiv   pre-print
Deep learning owes its success to three key factors: scale of data, enhanced models to learn representations from data, and scale of computation.  ...  We addressed the first question by presenting CNN model enhancements in the aspects of representation, optimization, and generalization.  ...  For scale of computation, please consult pioneering work at Google [8] , and subsequent efforts on accelerating deep learning such as DistBelief [18] , Adam [13] , and SpeeDO [69] .  ... 
arXiv:1707.09873v1 fatcat:lhrqlkdfcrfgtn6rluyotvyn4u
« Previous Showing results 1 — 15 out of 175,778 results