A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is application/pdf
.
Filters
Probability Distillation: A Caveat and Alternatives
2019
Conference on Uncertainty in Artificial Intelligence
We then explore alternative principles for distillation, including one with an "instructive" signal, and show that it is possible to achieve qualitatively better results than with KL minimization. ...
Due to Van den Oord et al. (2018) , probability distillation has recently been of interest to deep learning practitioners, where, as a practical workaround for deploying autoregressive models in real-time ...
ACKNOWLEDGEMENTS Chin-Wei would like to thank Shawn Tan, and Kris Sankaran for discussion and feedback, and David Krueger for contributing to the idea of minimizing zreconstruction for distillation. ...
dblp:conf/uai/HuangAKLC19
fatcat:krqgncoiujcyxmq35gursrf7la
Distilling Model Knowledge
[article]
2015
arXiv
pre-print
In this thesis, we study knowledge distillation, the idea of extracting the knowledge contained in a complex model and injecting it into a more convenient model. ...
We present a general framework for knowledge distillation, whereby a convenient model of our choosing learns how to mimic a complex model, by observing the latter's behaviour and being penalized whenever ...
learning (and beyond!). ...
arXiv:1510.02437v1
fatcat:c32papt6hngcre4wkl3nygac7a
Entanglement distillation from quasifree Fermions
[article]
2011
arXiv
pre-print
We develop a scheme to distill entanglement from bipartite Fermionic systems in an arbitrary quasifree state. ...
We show that the efficiency of the proposed scheme is in general very good and in some cases even optimal. ...
There are, however, two important caveats. ...
arXiv:1003.2797v4
fatcat:nfaods7rbzbxvdnzt733lvy7m4
On the Efficacy of Knowledge Distillation
2019
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
In this paper, we present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures. ...
We find typical ways of circumventing this (such as performing a sequence of knowledge distillation steps) to be ineffective. ...
We first test this claim on CIFAR with multiple networks and with both knowledge distillation and attention transfer (Table 4 ). We find that there are several caveats to Furlanello et al.'s result. ...
doi:10.1109/iccv.2019.00489
dblp:conf/iccv/ChoH19
fatcat:2cpeg6hkf5cn7bnbpc3cw7s3fi
Systematic distillation of composite Fibonacci anyons using one mobile quasiparticle
[article]
2012
arXiv
pre-print
Given the ability to pull nontrivial Fibonacci anyon pairs from the vacuum with a certain success probability, we show how to simulate universal quantum computation by braiding one quasiparticle and with ...
We study a model in which both measurement and braiding capabilities are limited. ...
5 in between five alternating applications of U and U † . ...
arXiv:1206.0330v1
fatcat:fvdqgfxrine3pfhhxtgiqvacb4
On the Efficacy of Knowledge Distillation
[article]
2019
arXiv
pre-print
In this paper, we present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures. ...
We find typical ways of circumventing this (such as performing a sequence of knowledge distillation steps) to be ineffective. ...
A particular way of using sequential knowledge distillation is as an alternative to ensembling to increase model accuracy [17, 6] . For example, Furlanello et al. ...
arXiv:1910.01348v1
fatcat:3zf2so2rxbdc7a5egpctvszrwu
Distilling the neural correlates of consciousness
2012
Neuroscience and Biobehavioral Reviews
To unravel these neural correlates of consciousness (NCC) a common scientific strategy is to compare perceptual conditions in which consciousness of a particular content is present with those in which ...
otherwise valid and valuable contrastive methodology. ...
Schwiedrzik for helpful discussions and comments on a previous version of this manuscript. We also thank Felipe Aedo-Jury for suggesting the title of the paper. ...
doi:10.1016/j.neubiorev.2011.12.003
pmid:22192881
fatcat:fwgpm2t7frfdtklgv3rvdycqou
Towards Model Agnostic Federated Learning Using Knowledge Distillation
[article]
2022
arXiv
pre-print
Our analysis shows that the degradation is largely due to a fundamental limitation of knowledge distillation under data heterogeneity. ...
We further validate our framework by analyzing and designing new protocols based on KD. ...
SPK is partly funded by an SNSF Fellowship and AA is funded by a research scholarship from MLO lab, EPFL headed by Martin Jaggi. ...
arXiv:2110.15210v2
fatcat:lphkm652xrc6jb5zuxczc65lue
Estimating and Maximizing Mutual Information for Knowledge Distillation
[article]
2021
arXiv
pre-print
Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network ...
We are able to obtain 74.55% accuracy on CIFAR100 with a ShufflenetV2 from a baseline accuracy of 69.8% by distilling knowledge from ResNet-50. ...
However, a caveat of this approach is that it requires comparing features from a large number of images simultaneously. ...
arXiv:2110.15946v2
fatcat:k2pzidd46jenpbqqchcetvy6h4
Autoregressive Knowledge Distillation through Imitation Learning
[article]
2020
arXiv
pre-print
We develop a compression technique for autoregressive models that is driven by an imitation learning perspective on knowledge distillation. ...
On prototypical language generation tasks such as translation and summarization, our method consistently outperforms other distillation algorithms, such as sequence-level knowledge distillation. ...
Acknowledgments We thank the ASAPP NLP team -especially Yi Yang, Nicholas Matthews, Joshua Shapiro, Hugh Perkins, Amit Ganatra, Lili Yu, Xinyuan Zhang, and Yoav Artzi -as well as the EMNLP reviewers for ...
arXiv:2009.07253v2
fatcat:2pk5mt46zjemznssyle2zwxua4
Improving Neural Ranking via Lossless Knowledge Distillation
[article]
2022
arXiv
pre-print
We explore a novel perspective of knowledge distillation (KD) for learning to rank (LTR), and introduce Self-Distilled neural Rankers (SDR), where student rankers are parameterized identically to their ...
The key success factors of SDR, which differs from common distillation techniques for classification are: (1) an appropriate teacher score transformation function, and (2) a novel listwise distillation ...
Alternatives We discuss possible alternatives that we will compare with in experiments. ...
arXiv:2109.15285v2
fatcat:z4ptirsvp5dytj25mtcbjlafdu
One Model to Recognize Them All: Marginal Distillation from NER Models with Different Tag Sets
[article]
2020
arXiv
pre-print
This paper presents a marginal distillation (MARDI) approach for training a unified NER model from resources with disjoint or heterogeneous tag sets. ...
., BiLSTM) and global models (e.g., CRF). ...
To address the limitation, we instead propose an alternative strategy for knowledge distillation of a structured learning model, such as the CRF. ...
arXiv:2004.05140v2
fatcat:g4zrgjapj5eqdhkrsxxdb4ftc4
Strategic Behavior in Whiskey Distilling, 1887–1895
2002
Journal of Economic History
First, while federal antitrust enforcement played a minor role in the demise of the trust and its rebate scheme, state antitrust regulation was probably important. ...
A caveat is in order, however: to the extent the trust initiated the rebate program because it had lost (gained) market power, these procedures will understate (overstate) its anticompetitive effects. ...
doi:10.1017/s0022050702001626
fatcat:7zwzsvlwxba7pi4mdleouwso74
Distilling GHZ States using Stabilizer Codes
[article]
2022
arXiv
pre-print
Entanglement distillation is a well-studied problem in quantum information, where one typically starts with n noisy Bell pairs and distills k Bell pairs of higher fidelity. ...
Guided by this insight, we develop a GHZ distillation protocol based on local operations and classical communication that uses any stabilizer code. ...
This research was carried out in part at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration and funded through JPL's ...
arXiv:2109.06248v2
fatcat:7d6r6ntfbjfmzmht6jul5x4cte
AWSD: Adaptive Weighted Spatiotemporal Distillation for Video Representation
2019
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
We propose an Adaptive Weighted Spatiotemporal Distillation (AWSD) technique for video representation by encoding the appearance and dynamics of the videos into a single RGB image map. ...
including UCF101, HMDB51, Activ-ityNet v1.3, and Maryland. ...
Acknowledgements The support of Academy of Finland, Infotech Oulu, Nokia, Tauno Tönning, and KAUTE Foundations is acknowledged. ...
doi:10.1109/iccv.2019.00811
dblp:conf/iccv/TavakolianTH19
fatcat:sald5iirq5evzeeexatcfvakai
« Previous
Showing results 1 — 15 out of 8,779 results