Filters








6 Hits in 1.9 sec

A Closer Look at Codistillation for Distributed Training [article]

Shagun Sodhani, Olivier Delalleau, Mahmoud Assran, Koustuv Sinha, Nicolas Ballas, Michael Rabbat
2021 arXiv   pre-print
We investigate codistillation in a distributed training setup, complementing previous work which focused on extremely large batch sizes.  ...  Surprisingly, we find that even at moderate batch sizes, models trained with codistillation can perform as well as models trained with synchronous data-parallel methods, despite using a much weaker synchronization  ...  In this paper, we further investigate codistillation for distributed training.  ... 
arXiv:2010.02838v2 fatcat:s7ugrgic2rdfdik5rm2neuwwxe

Emerging Properties in Self-Supervised Vision Transformers [article]

Mathilde Caron, Hugo Touvron, Ishan Misra, Hervé Jégou, Julien Mairal, Piotr Bojanowski, Armand Joulin
2021 arXiv   pre-print
Our study also underlines the importance of momentum encoder, multi-crop training, and the use of small patches with ViTs.  ...  We implement our findings into a simple self-supervised method, called DINO, which we interpret as a form of self-distillation with no labels.  ...  We thank Mahmoud Assran, Matthijs Douze, Allan Jabri, Jure Zbontar, Alaaeldin El-Nouby, Y-Lan Boureau, Kaiming He, Thomas Lucas as well as the Thoth and FAIR teams for their help, support and discussions  ... 
arXiv:2104.14294v2 fatcat:72elv63ryjgktiieydcyw4chce

Emerging Properties in Self-Supervised Vision Transformers

Mathilde Caron, Hugo Touvron, Ishan Misra, Herve Jegou, Julien Mairal, Piotr Bojanowski, Armand Joulin
2021 2021 IEEE/CVF International Conference on Computer Vision (ICCV)  
We thank Mahmoud Assran, Matthijs Douze, Allan Jabri, Jure Zbontar, Alaaeldin El-Nouby, Y-Lan Boureau, Kaiming He, Thomas Lucas as well as the Thoth and FAIR teams for their help, support and discussions  ...  Class Representation As a final visualization, we propose to look at the distribution of ImageNet concepts in the feature space from DINO.  ...  We look at the attention map when using the [CLS] token as a query for the different heads in the last layer. Note that the [CLS] token is not attached to any label or supervision.  ... 
doi:10.1109/iccv48922.2021.00951 fatcat:iz4zfj7z25gxvpwhkqaiuf4dlm

Collaborative learning between cloud and end devices

Yan Lu, Yuanchao Shu, Xu Tan, Yunxin Liu, Mengyu Zhou, Qi Chen, Dan Pei
2019 Proceedings of the 4th ACM/IEEE Symposium on Edge Computing - SEC '19  
Our experiments also validate the efficiency of Colla, showing that one overnight training on a commodity smartphone can process one-year data from a typical smartphone, at the cost of 2000mWh and few  ...  Colla finds a middle ground to build tailored model for each device, leveraging local data and computation resources to update the model, while at the same time exploits cloud to aggregate and transfer  ...  Particularly, we take a closer look at collaboration performance between 100 most active devices (a.k.a., top-100). Data from the first month is used as initial training set for the cloud.  ... 
doi:10.1145/3318216.3363304 dblp:conf/edge/LuSTLZCP19 fatcat:caowfqxmazfypn46rvw4em4uiy

Compression of Deep Learning Models for Text: A Survey [article]

Manish Gupta, Puneet Agrawal
2021 arXiv   pre-print
NLP' community in the past fewyears and presents it as a coherent story.  ...  RNNs), Gated Recurrent Units (GRUs) and Long Short-Term Memory (LSTMs)networks, and Transformer [120] based models like Bidirectional Encoder Representations from Transformers (BERT) [24], GenerativePre-training  ...  Can we design model compression mechanisms aimed at looking at a tradeoff between model accuracy, size, latency and interpretability. • None of the model compression methods performs any application specific  ... 
arXiv:2008.05221v4 fatcat:6frf2wzi7zganaqgkuvy4szgmq

Canadian toxic chemical policy

John Robert Sturdy
1980
A pre-market strategy is necessary to establish priorities for control among the many chemicals posing a potential hazard.  ...  To aid in arriving at acceptable standards a consultative approach with government, industry and the public as participants was suggested.  ...  A network allows a base to be geographically closer to major users or data generators.  ... 
doi:10.14288/1.0094773 fatcat:2eupq57jz5eztffqz5eoqkjeae