3,404 Hits in 4.2 sec

Soft-Label Dataset Distillation and Text Dataset Distillation [article]

Ilia Sucholutsky, Matthias Schonlau
2020 arXiv   pre-print
We propose to simultaneously distill both images and their labels, thus assigning each synthetic sample a 'soft' label (a distribution of labels).  ...  We also extend the dataset distillation algorithm to distill sequential datasets including texts. We demonstrate that text distillation outperforms other methods across multiple datasets.  ...  Our soft-label dataset distillation (SLDD) algorithm also uses 'soft' labels but these are persistent and learned over the training phase of a network (rather than being produced during the inference phase  ... 
arXiv:1910.02551v3 fatcat:65ybpaulczbubepah7dchlh6vq

Evaluating Defensive Distillation For Defending Text Processing Neural Networks Against Adversarial Examples [article]

Marcus Soll, Tobias Hinz, Sven Magg, Stefan Wermter
2019 arXiv   pre-print
However, instead of applying defensive distillation to networks for image classification, we examine, for the first time, its performance on text classification tasks and also evaluate its effect on the  ...  Our results indicate that defensive distillation only has a minimal impact on text classifying neural networks and does neither help with increasing their robustness against adversarial examples nor prevent  ...  The following software libraries were used for this work: Keras, Tensorflow, Gensim, NLTK with the WordNet interface, and NumPy.  ... 
arXiv:1908.07899v1 fatcat:mw6cofoclfepvncghbosi3uog4

Distilled One-Shot Federated Learning [article]

Yanlin Zhou, George Pu, Xiyao Ma, Xiaolin Li, Dapeng Wu
2021 arXiv   pre-print
Inspired by recent work on dataset distillation and distributed one-shot learning, we propose Distilled One-Shot Federated Learning (DOSFL) to significantly reduce the communication cost while achieving  ...  In just one round, each client distills their private dataset, sends the synthetic data (e.g. images or sentences) to the server, and collectively trains a global model.  ...  Using soft label dataset distillation, Sucholutsky et al.  ... 
arXiv:2009.07999v3 fatcat:k4yomgztl5a3nkbqp3rubepuc4

Distilled Dual-Encoder Model for Vision-Language Understanding [article]

Zekun Wang, Wenhui Wang, Haichao Zhu, Ming Liu, Bing Qin, Furu Wei
2021 arXiv   pre-print
In order to learn deep interactions of images and text, we introduce cross-modal attention distillation, which uses the image-to-text and text-to-image attention distributions of a fusion-encoder model  ...  Dual-encoder models have a faster inference speed than fusion-encoder models and enable the pre-computation of images and text during inference.  ...  Training Objectives Task Ground Truth Cross-Modal Soft Label Attention Label Pre-training Distillation Image-Text Matching Image-Text Contrastive Masked LM Fine-tuning Distillation VL Understanding Image-Text  ... 
arXiv:2112.08723v1 fatcat:lkmjcqn4q5fsjextn4fy37lj2a

Learning from Noisy Labels with Distillation [article]

Yuncheng Li, Jianchao Yang, Yale Song, Liangliang Cao, Jiebo Luo, Li-Jia Li
2017 arXiv   pre-print
In this work, we propose a unified distillation framework to use side information, including a small clean dataset and label relations in knowledge graph, to "hedge the risk" of learning from noisy labels  ...  Furthermore, unlike the traditional approaches evaluated based on simulated label noises, we propose a suite of new benchmark datasets, in Sports, Species and Artifacts domains, to evaluate the task of  ...  As shown in the analysis of Section 3.2, the distillation parameters λ and knowledge graph G need to be properly chosen and designed, in order for the soft labels to achieve Name Clean Set D c Noisy Set  ... 
arXiv:1703.02391v2 fatcat:2wctk4ub7jabzhf32njltr3aiy

Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation [article]

Ruizhe Cheng, Bichen Wu, Peizhao Zhang, Peter Vajda, Joseph E. Gonzalez
2021 arXiv   pre-print
We propose a data-efficient contrastive distillation method that uses soft labels to learn from noisy image-text pairs.  ...  CLIP, however, is data hungry and requires more than 400M image text pairs for training.  ...  Instead, we propose to use a hybrid of hard contrastive and soft distillation losses. We distill the model from its running Exponential Moving Average(EMA) with soft labels, as a method of denoising.  ... 
arXiv:2104.08945v1 fatcat:ptrgrcpdevfnldd6nr6qzbmtay

Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data [article]

Subhabrata Mukherjee, Ahmed Hassan Awadallah
2020 arXiv   pre-print
The student performance can be further improved with soft distillation and leveraging teacher intermediate representations.  ...  In this work, we leverage large amounts of in-domain unlabeled transfer data in addition to a limited amount of labeled training instances to bridge this gap for distilling BERT.  ...  Distillation with BERT Large (with pre-training and fine-tuning) on 100 labeled samples per class. Dataset RNN Student Student BERT no distil. with distil.  ... 
arXiv:1910.01769v2 fatcat:hy4hu5krm5bg7o3cg74ypbanoe

MSD: Saliency-aware Knowledge Distillation for Multimodal Understanding [article]

Woojeong Jin, Maziar Sanjabi, Shaoliang Nie, Liang Tan, Xiang Ren, Hamed Firooz
2021 arXiv   pre-print
datasets.  ...  In this paper, we perform a large-scale empirical study to investigate the importance and effects of each modality in knowledge distillation.  ...  loss on soft labels, and λ ∈ [0, 1] controls the balance between hard and soft targets.  ... 
arXiv:2101.01881v2 fatcat:d2x74ys6wvcwlp5ia2bmg3hx2i

BERTtoCNN: Similarity-preserving enhanced knowledge distillation for stance detection

Yang Li, Yuqing Sun, Nana Zhu, Weinan Zhang
2021 PLoS ONE  
We conduct experiments and test the proposed model on the open Chinese and English stance detection datasets.  ...  In recent years, text sentiment analysis has attracted wide attention, and promoted the rise and development of stance detection research.  ...  The soft label and ground truth label of teacher model are important to improve the performance of student model, and are used to extract distill loss and student loss, respectively.  ... 
doi:10.1371/journal.pone.0257130 pmid:34506549 pmcid:PMC8432858 fatcat:hnfbgotconefrjpgsx4pmbicu4

Distilling Knowledge from Well-Informed Soft Labels for Neural Relation Extraction

Zhenyu Zhang, Xiaobo Shu, Bowen Yu, Tingwen Liu, Jiapeng Zhao, Quangang Li, Li Guo
Furthermore, this model is regarded as teacher to generate well-informed soft labels and guide the optimization of a student network via knowledge distillation.  ...  In this paper, we aim to explore the supervision with soft labels in relation extraction, which makes it possible to integrate prior knowledge.  ...  In this paper, we employ knowledge distillation to help us mine soft labels and transfer knowledge for RE.  ... 
doi:10.1609/aaai.v34i05.6509 fatcat:cqcm3hstd5gztbeyuwbo2rgqvy

Cross-lingual Distillation for Text Classification [article]

Ruochen Xu, Yiming Yang
2018 arXiv   pre-print
We conducted experiments on two benchmark CLTC datasets, treating English as the source language and German, French, Japan and Chinese as the unlabeled target languages.  ...  Using soft probabilistic predictions for the documents in a label-rich language as the (induced) supervisory labels in a parallel corpus of documents, we train classifiers successfully for new languages  ...  the soft labels made by the source classifier on the source language side.  ... 
arXiv:1705.02073v2 fatcat:cojoijcakjh7doluti5ffcbygm

Learning from a Lightweight Teacher for Efficient Knowledge Distillation [article]

Yuang Liu, Wei Zhang, Jun Wang
2020 arXiv   pre-print
The recent study shows vanilla KD plays a similar role as label smoothing and develops teacher-free KD, being efficient and mitigating the issue of learning from heavy teachers.  ...  Knowledge Distillation (KD) is an effective framework for compressing deep learning models, realized by a student-teacher paradigm requiring small student networks to mimic the soft target generated by  ...  Datasets To ensure reliable comparison, we adopt multiple datasets, covering the modalities of image, text, and video. CIFAR10 and CIFAR100.  ... 
arXiv:2005.09163v1 fatcat:5ddet6muqzawtdavsyra5eqzty

VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer [article]

Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal
2021 arXiv   pre-print
Despite its success, the method suffers from approximation error of using finite image labels and the lack of vocabulary diversity of a small image-text dataset.  ...  We train a multi-modal teacher model on a video-text dataset, and then transfer its knowledge to a student language model with a text dataset.  ...  We thank Yixin Nie and Gabriel Ilharco for useful dataset suggestions.  ... 
arXiv:2107.02681v2 fatcat:etpkmmnjpjbkzkfawritb6brhm

Cross-lingual Distillation for Text Classification

Ruochen Xu, Yiming Yang
2017 Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)  
We conducted experiments on two benchmark CLTC datasets, treating English as the source language and German, French, Japan and Chinese as the unlabeled target languages.  ...  Using soft probabilistic predictions for the documents in a label-rich language as the (induced) supervisory labels in a parallel corpus of documents, we train classifiers successfully for new languages  ...  the soft labels made by the source classifier on the source language side.  ... 
doi:10.18653/v1/p17-1130 dblp:conf/acl/XuY17 fatcat:vjrihy45pbhsnal7sfrlkmfkl4

Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding [article]

Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
2019 arXiv   pre-print
This paper explores the use of knowledge distillation to improve a Multi-Task Deep Neural Network (MT-DNN) (Liu et al., 2019) for learning text representations across multiple natural language understanding  ...  For each task, we train an ensemble of different MT-DNNs (teacher) that outperforms any single model, and then train a single MT-DNN (student) via multi-task learning to distill knowledge from these ensemble  ...  Acknowledgments We thank Asli Celikyilmaz, Xuedong Huang, Moontae Lee, Chunyuan Li, Xiujun Li, and Michael Patterson for helpful discussions and comments.  ... 
arXiv:1904.09482v1 fatcat:najnmz3przajfga5s7zm6tyd3a
« Previous Showing results 1 — 15 out of 3,404 results