262 Hits in 4.9 sec

Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space

Shangchen Du, Shan You, Xiaojie Li, Jianlong Wu, Fei Wang, Chen Qian, Changshui Zhang
2020 Neural Information Processing Systems  
In this paper, we examine the diversity of teacher models in the gradient space and regard the ensemble knowledge distillation as a multi-objective optimization problem so that we can determine a better  ...  Distilling knowledge from an ensemble of teacher models is expected to have a more promising performance than that from a single one.  ...  In this paper, we propose a new adaptive ensemble knowledge distillation (AE-KD) method to encourage the comprehensive distillation from an ensemble.  ... 
dblp:conf/nips/DuYLW00Z20 fatcat:d2ram6echjcqpdr3i6wq5o7y3m

Does Knowledge Distillation Really Work? [article]

Samuel Stanton, Pavel Izmailov, Polina Kirichenko, Alexander A. Alemi, Andrew Gordon Wilson
2021 arXiv   pre-print
Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks.  ...  We identify difficulties in optimization as a key reason for why the student is unable to match the teacher.  ...  The experiments in the main text consider m ∈ {1, 3, 5}, and we include results up to m = 12 in Appendix B.2.1 3.1 Knowledge Distillation Hinton et al. [20] proposed a simple approach to knowledge distillation  ... 
arXiv:2106.05945v2 fatcat:6i4d2owgtjasve6mhlovts5j6y

BaIT: Barometer for Information Trustworthiness [article]

Oisín Nolan, Jeroen van Mourik, Callum Tilbury
2022 arXiv   pre-print
Methods in data augmentation are explored as a means of tackling class imbalance in the dataset, employing common pre-existing methods and proposing a method for sample generation in the under-represented  ...  This paper presents a new approach to the FNC-1 fake news classification task which involves employing pre-trained encoder models from similar NLP tasks, namely sentence similarity and natural language  ...  Moreover, in NLI embedding space, the headline is close to B1, but quite far from B3, corresponding to the respective agree and disagree labels. α i b i (1) The body representation, b, is then a vector  ... 
arXiv:2206.07535v1 fatcat:c5n5ur6e65djha2vhyavx73as4

Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models

Jiawei Zhang, Yang Wang, Piero Molino, Lezhi Li, David S. Ebert
2018 IEEE Transactions on Visualization and Computer Graphics  
We present Manifold, a framework that utilizes visual analysis techniques to support interpretation, debugging, and comparison of machine learning models in a more transparent and interactive manner.  ...  Interpretation and diagnosis of machine learning models have gained renewed interest in recent years with breakthroughs in new approaches.  ...  In contrast, our technique supports more fine-grained functionality to drill down to specific symptom instances on which different models agree or disagree.  ... 
doi:10.1109/tvcg.2018.2864499 pmid:30130197 fatcat:bci4vui2m5f7boj3re3lv4rvsq

Deep Co-Training for Semi-Supervised Image Segmentation [article]

Jizong Peng, Guillermo Estrada, Marco Pedersoli, Christian Desrosiers
2019 arXiv   pre-print
In this paper, we aim to improve the performance of semantic image segmentation in a semi-supervised setting in which training is effectuated with a reduced set of annotated images and additional non-annotated  ...  Our results show that this ability to simultaneously train models, which exchange knowledge while preserving diversity, leads to state-of-the-art results on two challenging medical image datasets.  ...  Since models in the ensemble must agree for unlabeled images, and their prediction on labeled images is constrained by ground-truth segmentation masks, training images cannot be used directly to impose  ... 
arXiv:1903.11233v3 fatcat:pkg55nawqrg4lcyfhaho3vqogi

A Survey of Unsupervised Deep Domain Adaptation

Garrett Wilson, Diane J. Cook
2020 ACM Transactions on Intelligent Systems and Technology  
adaptation to reduce reliance on potentially-costly target data labels.  ...  As a complement to this challenge, single-source unsupervised domain adaptation can handle situations where a network is trained on labeled data from a source domain and unlabeled data from a related but  ...  The more models in the ensemble that agree, the higher the ensemble's confidence in that prediction.  ... 
doi:10.1145/3400066 pmid:34336374 pmcid:PMC8323662 fatcat:vh52rfgjgrc37kyctkqwlykq7e

A Survey of Unsupervised Deep Domain Adaptation [article]

Garrett Wilson, Diane J. Cook
2020 arXiv   pre-print
adaptation to reduce reliance on potentially-costly target data labels.  ...  As a complement to this challenge, single-source unsupervised domain adaptation can handle situations where a network is trained on labeled data from a source domain and unlabeled data from a related but  ...  The more models in the ensemble that agree, the higher the ensemble's confidence in that prediction.  ... 
arXiv:1812.02849v3 fatcat:paefg5cywbe3tjsp6dffnwkvxy

Understanding the Logit Distributions of Adversarially-Trained Deep Neural Networks [article]

Landan Seguin, Anthony Ndirango, Neeli Mishra, SueYeon Chung, Tyler Lee
2021 arXiv   pre-print
in AT models.  ...  Finally, we find learning information about incorrect classes to be essential to learning robustness by manipulating the non-max logit information during distillation and measuring the impact on the student's  ...  Figure .2, we visualize samples where the AT and ST networks either agree or disagree on high and low confidence images.  ... 
arXiv:2108.12001v1 fatcat:w6aabr7ugvaerh5u4i3xf4clfe

Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey [article]

Görkem Algan, Ilkay Ulusoy
2021 arXiv   pre-print
Algorithms in the first group aim to estimate the noise structure and use this information to avoid the adverse effects of noisy labels.  ...  Because of these practical challenges, label noise is a common problem in real-world datasets, and numerous methods to train deep neural networks with label noise are proposed in the literature.  ...  They used distillation technique proposed in [163] for controlled transfer of knowledge from teacher to student.  ... 
arXiv:1912.05170v3 fatcat:k5zm5k5e5bevph2abvwxkwr7qm

Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples [article]

Nicolas Papernot and Patrick McDaniel and Ian Goodfellow
2016 arXiv   pre-print
learning classification systems from Amazon (96.19% misclassification rate) and Google (88.94%) using only 800 queries of the victim model, thereby showing that existing machine learning approaches are in  ...  Many machine learning models are vulnerable to adversarial examples: inputs that are specially crafted to cause a machine learning model to produce an incorrect output.  ...  Upon empirically exploring the input variation parameter space, we set it to ε = 0.3 for the fast gradient sign method algorithm, and ε = 1.5 for the SVM algorithm.  ... 
arXiv:1605.07277v1 fatcat:mlnntpsmbnfe3gahi7a3u77rlu

Domain Consistency Regularization for Unsupervised Multi-source Domain Adaptive Classification [article]

Zhipeng Luo, Xiaobing Zhang, Shijian Lu, Shuai Yi
2021 arXiv   pre-print
In this paper, we propose an end-to-end trainable network that exploits domain Consistency Regularization for unsupervised Multi-source domain Adaptive classification (CRMA).  ...  Deep learning-based multi-source unsupervised domain adaptation (MUDA) has been actively studied in recent years.  ...  Zhao et al. [13] proposed a multi-source distilling domain adaptation (MDDA) network to handle different similarities between multiple source domains and the target domain by generating a weighted ensemble  ... 
arXiv:2106.08590v1 fatcat:5rujqvspjjcj3ku4hfr7nbzpje

Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey [article]

Julian Wörmann, Daniel Bogdoll, Etienne Bührle, Han Chen, Evaristus Fuh Chuo, Kostadin Cvejoski, Ludger van Elst, Tobias Gleißner, Philip Gottschall, Stefan Griesche, Christian Hellert, Christian Hesels (+34 others)
2022 arXiv   pre-print
Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models  ...  This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge.  ...  Adaptive loss constraint Random Incremental Adam-NSCL [764] 2021 Projected Gradient Project update in null space - Incremental GPM [629] 2021 Projected Gradient Orthogonal task updates - Incremental  ... 
arXiv:2205.04712v1 fatcat:u2bgxr2ctnfdjcdbruzrtjwot4

Learning more skills through optimistic exploration [article]

DJ Strouse, Kate Baumli, David Warde-Farley, Vlad Mnih, Steven Hansen
2022 arXiv   pre-print
., 2018) allow agents to learn rich repertoires of behavior in the absence of extrinsic rewards.  ...  To combat this inherent pessimism towards exploration, we derive an information gain auxiliary objective that involves training an ensemble of discriminators and rewarding the policy for their disagreement  ...  and plotting, Phil Bachman for correcting the error discussed in Appendix C, and David Schwab for pointing us to the original Query by Committee work (Seung et al., 1992) .  ... 
arXiv:2107.14226v6 fatcat:ybh2q3ldqfgjpmrcsy7oxbqzau

Evolving interpretable plasticity for spiking networks

Jakob Jordan, Maximilian Schmidt, Walter Senn, Mihai A Petrovici
2021 eLife  
Continuous adaptation allows survival in an ever-changing world.  ...  We successfully apply our approach to typical learning scenarios and discover previously unknown mechanisms for learning efficiently from rewards, recover efficient gradient-descent methods for learning  ...  Second, an evolutionary search does not need to compute gradients in the search space, thereby circumventing the need to estimate a gradient in non-differentiable systems.  ... 
doi:10.7554/elife.66273 pmid:34709176 pmcid:PMC8553337 fatcat:ekwsjg3gcndzbcgbuzb6y32q3y

Posterior Meta-Replay for Continual Learning [article]

Christian Henning, Maria R. Cervera, Francesco D'Angelo, Johannes von Oswald, Regina Traber, Benjamin Ehret, Seijin Kobayashi, Benjamin F. Grewe, João Sacramento
2021 arXiv   pre-print
In principle, Bayesian learning directly applies to this setting, since recursive and one-off Bayesian updates yield the same result.  ...  In practice, however, recursive updating often leads to poor trade-off solutions across tasks because approximate inference is necessary for most models of interest.  ...  We also would like to thank Sebastian Farquhar for discussions on Radial posteriors and for proofreading our implementation of this method.  ... 
arXiv:2103.01133v3 fatcat:4tjj74x74vew7gqif4atmg5qjm
« Previous Showing results 1 — 15 out of 262 results