A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is
We present, MATE-KD, a novel text-based adversarial training algorithm which improves the performance of knowledge distillation. ... MATE-KD first trains a masked language model based generator to perturb text by maximizing the divergence between teacher and student logits. ... Table 5 presents the contribution of the generator and adversarial learning to MATE-KD. ...arXiv:2105.05912v1 fatcat:22n7p6pnuvb5zh3hsnqrgcftee
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
MATE-KD first trains a masked language model-based generator to perturb text by maximizing the divergence between teacher and student logits. ... We present MATE-KD, a novel textbased adversarial training algorithm which improves the performance of knowledge distillation. ... to a wide audience around the globe. ...doi:10.18653/v1/2021.acl-long.86 fatcat:lfq7y23w6ngunlrbmxvwev5yla
Knowledge distillation (KD) has enabled effective optimization of compact neural nets, achieving the best results when the knowledge of an expensive network is distilled via fresh task-specific unlabeled ... A language model (LM) is used to synthesize in-domain unlabeled data. Then, a classifier is used to annotate such data. ... Mate-kd: Masked adversarial text, a companion to knowledge distillation. arXiv preprint arXiv:2105.05912, 2021. Suman Ravuri and Oriol Vinyals. ...arXiv:2106.06168v2 fatcat:wrwmtxtqozcu5bk4rouw7buxwm