8,779 Hits in 4.1 sec

Probability Distillation: A Caveat and Alternatives

Chin-Wei Huang, Faruk Ahmed, Kundan Kumar, Alexandre Lacoste, Aaron C. Courville
2019 Conference on Uncertainty in Artificial Intelligence  
We then explore alternative principles for distillation, including one with an "instructive" signal, and show that it is possible to achieve qualitatively better results than with KL minimization.  ...  Due to Van den Oord et al. (2018) , probability distillation has recently been of interest to deep learning practitioners, where, as a practical workaround for deploying autoregressive models in real-time  ...  ACKNOWLEDGEMENTS Chin-Wei would like to thank Shawn Tan, and Kris Sankaran for discussion and feedback, and David Krueger for contributing to the idea of minimizing zreconstruction for distillation.  ... 
dblp:conf/uai/HuangAKLC19 fatcat:krqgncoiujcyxmq35gursrf7la

Distilling Model Knowledge [article]

George Papamakarios
2015 arXiv   pre-print
In this thesis, we study knowledge distillation, the idea of extracting the knowledge contained in a complex model and injecting it into a more convenient model.  ...  We present a general framework for knowledge distillation, whereby a convenient model of our choosing learns how to mimic a complex model, by observing the latter's behaviour and being penalized whenever  ...  learning (and beyond!).  ... 
arXiv:1510.02437v1 fatcat:c32papt6hngcre4wkl3nygac7a

Entanglement distillation from quasifree Fermions [article]

Zoltan Kadar, Michael Keyl, Dirk Schlingemann
2011 arXiv   pre-print
We develop a scheme to distill entanglement from bipartite Fermionic systems in an arbitrary quasifree state.  ...  We show that the efficiency of the proposed scheme is in general very good and in some cases even optimal.  ...  There are, however, two important caveats.  ... 
arXiv:1003.2797v4 fatcat:nfaods7rbzbxvdnzt733lvy7m4

On the Efficacy of Knowledge Distillation

Jang Hyun Cho, Bharath Hariharan
2019 2019 IEEE/CVF International Conference on Computer Vision (ICCV)  
In this paper, we present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures.  ...  We find typical ways of circumventing this (such as performing a sequence of knowledge distillation steps) to be ineffective.  ...  We first test this claim on CIFAR with multiple networks and with both knowledge distillation and attention transfer (Table 4 ). We find that there are several caveats to Furlanello et al.'s result.  ... 
doi:10.1109/iccv.2019.00489 dblp:conf/iccv/ChoH19 fatcat:2cpeg6hkf5cn7bnbpc3cw7s3fi

Systematic distillation of composite Fibonacci anyons using one mobile quasiparticle [article]

Ben W. Reichardt
2012 arXiv   pre-print
Given the ability to pull nontrivial Fibonacci anyon pairs from the vacuum with a certain success probability, we show how to simulate universal quantum computation by braiding one quasiparticle and with  ...  We study a model in which both measurement and braiding capabilities are limited.  ...  5 in between five alternating applications of U and U † .  ... 
arXiv:1206.0330v1 fatcat:fvdqgfxrine3pfhhxtgiqvacb4

On the Efficacy of Knowledge Distillation [article]

Jang Hyun Cho, Bharath Hariharan
2019 arXiv   pre-print
In this paper, we present a thorough evaluation of the efficacy of knowledge distillation and its dependence on student and teacher architectures.  ...  We find typical ways of circumventing this (such as performing a sequence of knowledge distillation steps) to be ineffective.  ...  A particular way of using sequential knowledge distillation is as an alternative to ensembling to increase model accuracy [17, 6] . For example, Furlanello et al.  ... 
arXiv:1910.01348v1 fatcat:3zf2so2rxbdc7a5egpctvszrwu

Distilling the neural correlates of consciousness

Jaan Aru, Talis Bachmann, Wolf Singer, Lucia Melloni
2012 Neuroscience and Biobehavioral Reviews  
To unravel these neural correlates of consciousness (NCC) a common scientific strategy is to compare perceptual conditions in which consciousness of a particular content is present with those in which  ...  otherwise valid and valuable contrastive methodology.  ...  Schwiedrzik for helpful discussions and comments on a previous version of this manuscript. We also thank Felipe Aedo-Jury for suggesting the title of the paper.  ... 
doi:10.1016/j.neubiorev.2011.12.003 pmid:22192881 fatcat:fwgpm2t7frfdtklgv3rvdycqou

Towards Model Agnostic Federated Learning Using Knowledge Distillation [article]

Andrei Afonin, Sai Praneeth Karimireddy
2022 arXiv   pre-print
Our analysis shows that the degradation is largely due to a fundamental limitation of knowledge distillation under data heterogeneity.  ...  We further validate our framework by analyzing and designing new protocols based on KD.  ...  SPK is partly funded by an SNSF Fellowship and AA is funded by a research scholarship from MLO lab, EPFL headed by Martin Jaggi.  ... 
arXiv:2110.15210v2 fatcat:lphkm652xrc6jb5zuxczc65lue

Estimating and Maximizing Mutual Information for Knowledge Distillation [article]

Aman Shrivastava, Yanjun Qi, Vicente Ordonez
2021 arXiv   pre-print
Our method uses a contrastive objective to simultaneously estimate and maximize a lower bound on the mutual information of local and global feature representations between a teacher and a student network  ...  We are able to obtain 74.55% accuracy on CIFAR100 with a ShufflenetV2 from a baseline accuracy of 69.8% by distilling knowledge from ResNet-50.  ...  However, a caveat of this approach is that it requires comparing features from a large number of images simultaneously.  ... 
arXiv:2110.15946v2 fatcat:k2pzidd46jenpbqqchcetvy6h4

Autoregressive Knowledge Distillation through Imitation Learning [article]

Alexander Lin, Jeremy Wohlwend, Howard Chen, Tao Lei
2020 arXiv   pre-print
We develop a compression technique for autoregressive models that is driven by an imitation learning perspective on knowledge distillation.  ...  On prototypical language generation tasks such as translation and summarization, our method consistently outperforms other distillation algorithms, such as sequence-level knowledge distillation.  ...  Acknowledgments We thank the ASAPP NLP team -especially Yi Yang, Nicholas Matthews, Joshua Shapiro, Hugh Perkins, Amit Ganatra, Lili Yu, Xinyuan Zhang, and Yoav Artzi -as well as the EMNLP reviewers for  ... 
arXiv:2009.07253v2 fatcat:2pk5mt46zjemznssyle2zwxua4

Improving Neural Ranking via Lossless Knowledge Distillation [article]

Zhen Qin, Le Yan, Yi Tay, Honglei Zhuang, Xuanhui Wang, Michael Bendersky, Marc Najork
2022 arXiv   pre-print
We explore a novel perspective of knowledge distillation (KD) for learning to rank (LTR), and introduce Self-Distilled neural Rankers (SDR), where student rankers are parameterized identically to their  ...  The key success factors of SDR, which differs from common distillation techniques for classification are: (1) an appropriate teacher score transformation function, and (2) a novel listwise distillation  ...  Alternatives We discuss possible alternatives that we will compare with in experiments.  ... 
arXiv:2109.15285v2 fatcat:z4ptirsvp5dytj25mtcbjlafdu

One Model to Recognize Them All: Marginal Distillation from NER Models with Different Tag Sets [article]

Keunwoo Peter Yu, Yi Yang
2020 arXiv   pre-print
This paper presents a marginal distillation (MARDI) approach for training a unified NER model from resources with disjoint or heterogeneous tag sets.  ...  ., BiLSTM) and global models (e.g., CRF).  ...  To address the limitation, we instead propose an alternative strategy for knowledge distillation of a structured learning model, such as the CRF.  ... 
arXiv:2004.05140v2 fatcat:g4zrgjapj5eqdhkrsxxdb4ftc4

Strategic Behavior in Whiskey Distilling, 1887–1895

Karen Clay, Werner Troesken
2002 Journal of Economic History  
First, while federal antitrust enforcement played a minor role in the demise of the trust and its rebate scheme, state antitrust regulation was probably important.  ...  A caveat is in order, however: to the extent the trust initiated the rebate program because it had lost (gained) market power, these procedures will understate (overstate) its anticompetitive effects.  ... 
doi:10.1017/s0022050702001626 fatcat:7zwzsvlwxba7pi4mdleouwso74

Distilling GHZ States using Stabilizer Codes [article]

Narayanan Rengaswamy, Ankur Raina, Nithin Raveendran, Bane Vasić
2022 arXiv   pre-print
Entanglement distillation is a well-studied problem in quantum information, where one typically starts with n noisy Bell pairs and distills k Bell pairs of higher fidelity.  ...  Guided by this insight, we develop a GHZ distillation protocol based on local operations and classical communication that uses any stabilizer code.  ...  This research was carried out in part at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration and funded through JPL's  ... 
arXiv:2109.06248v2 fatcat:7d6r6ntfbjfmzmht6jul5x4cte

AWSD: Adaptive Weighted Spatiotemporal Distillation for Video Representation

Mohammad Tavakolian, Hamed Rezazadegan Tavakoli, Abdenour Hadid
2019 2019 IEEE/CVF International Conference on Computer Vision (ICCV)  
We propose an Adaptive Weighted Spatiotemporal Distillation (AWSD) technique for video representation by encoding the appearance and dynamics of the videos into a single RGB image map.  ...  including UCF101, HMDB51, Activ-ityNet v1.3, and Maryland.  ...  Acknowledgements The support of Academy of Finland, Infotech Oulu, Nokia, Tauno Tönning, and KAUTE Foundations is acknowledged.  ... 
doi:10.1109/iccv.2019.00811 dblp:conf/iccv/TavakolianTH19 fatcat:sald5iirq5evzeeexatcfvakai
« Previous Showing results 1 — 15 out of 8,779 results