Filters








93,785 Hits in 3.3 sec

Distilling the Knowledge from Conditional Normalizing Flows [article]

Dmitry Baranchuk, Vladimir Aliev, Artem Babenko
2021 arXiv   pre-print
We provide a positive answer to this question by proposing a simple distillation approach and demonstrating its effectiveness on state-of-the-art conditional flow-based models for image super-resolution  ...  In this work, we investigate whether one can distill flow-based models into more efficient alternatives.  ...  The closest works to ours distill knowledge from an expensive autoregressive neural vocoder to normalizing flows with parallel inference.  ... 
arXiv:2106.12699v3 fatcat:7fn3hnwtyrbqbhdvkozp2rprfm

Sentence Embeddings by Ensemble Distillation [article]

Fredrik Carlsson Magnus Sahlgren
2021 arXiv   pre-print
Our experiments demonstrate that a model trained to learn the average embedding space from multiple ensemble students outperforms all the other individual models with high robustness.  ...  We compare and combine a number of recently proposed sentence embedding methods for STS, and propose a novel and simple ensemble knowledge distillation scheme that improves on previous approaches.  ...  However, the Flow normalization is only applied to the final distillation learner, and not to the ensemble models.  ... 
arXiv:2104.06719v1 fatcat:qut2ewkxhjdbdfrhfsw6yjohby

Graph Flow: Cross-layer Graph Flow Distillation for Dual Efficient Medical Image Segmentation [article]

Wenxuan Zou, Muyi Sun
2022 arXiv   pre-print
Specifically, our core Graph Flow Distillation transfer the essence of cross-layer variations from a well-trained cumbersome teacher network to a non-trained compact student network.  ...  To tackle these problems, we propose Graph Flow, a comprehensive knowledge distillation framework, for both network-efficiency and annotation-efficiency medical image segmentation.  ...  Our main contributions are as follows: • We propose Graph Flow Distillation, a novel knowledge distillation method, to transfer the flow of cross-layer salience graphs from a teacher network to a student  ... 
arXiv:2203.08667v4 fatcat:nfmug3u5nbba5p2z33wxmfdtka

Porn Streamer Recognition in Live Video Based on Multimodal Knowledge Distillation

WANG Liyuan, ZHANG Jing, YAO Jiacheng, ZHUO Li
2021 Chinese journal of electronics  
Second, a lightweight student model constructed with MobileNetV2 and Xception transfers the knowledge from the teacher model by using multimodal knowledge distillation strategy.  ...  In order to improve the recognition efficiency of porn streamer in live video, a deep network model compression method based on multimodal knowledge distillation is proposed.  ...  recognition practice to "distill" knowledge from the teacher model to the student model.  ... 
doi:10.1049/cje.2021.07.027 fatcat:uc2kal47jzbevlsfr3y6l3xzqq

BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance [article]

Jianquan Li, Xiaokang Liu, Honghong Zhao, Ruifeng Xu, Min Yang, Yaohong Jin
2020 arXiv   pre-print
In addition, we leverage Earth Mover's Distance (EMD) to compute the minimum cumulative cost that must be paid to transform knowledge from teacher network to student network.  ...  %motivated by the intuition that different NLP tasks require different levels of linguistic knowledge contained in the intermediate layers of BERT.  ...  Once the optimal mapping flow F A is learned, we can define the Earth Mover's Distance as the work normalized by the total flow: EMD(A S , A T ) = M i=1 N j=1 f A ij d A ij M i=1 N j=1 f A ij (9) Finally  ... 
arXiv:2010.06133v1 fatcat:ttbwizqsdfaupdfiztap5z6zf4

Anomaly Detection via Reverse Distillation from One-Class Embedding [article]

Hanqiu Deng, Xingyu Li
2022 arXiv   pre-print
Inherently, knowledge distillation in this study starts from abstract, high-level presentations to low-level features.  ...  Knowledge distillation (KD) achieves promising results on the challenging problem of unsupervised anomaly detection (AD).The representation discrepancy of anomalies in the teacher-student (T-S) model provides  ...  data flow in the T-S model during knowledge transfer/distillation.  ... 
arXiv:2201.10703v2 fatcat:6vr5ouot4nafdco6wxn3cclr3m

Knowledge Flow: Improve Upon Your Teachers [article]

Iou-Jen Liu and Jian Peng and Alexander G. Schwing
2019 arXiv   pre-print
To address this issue, in this paper, we develop knowledge flow which moves 'knowledge' from multiple deep nets, referred to as teachers, to a new deep net model, called the student.  ...  Upon training with knowledge flow the student is independent of the teachers.  ...  Figure 1 : (a) Example of a two-teacher knowledge flow. (b) Deep net transformation of knowledge flow. (c) Average normalized weights for teachers' and the student's layers.  ... 
arXiv:1904.05878v1 fatcat:xsdxqdkxmjhshhlazecuva5iim

Safe Distillation Box [article]

Jingwen Ye, Yining Mao, Jie Song, Xinchao Wang, Cheng Jin, Mingli Song
2021 arXiv   pre-print
Knowledge distillation (KD) has recently emerged as a powerful strategy to transfer knowledge from a pre-trained teacher model to a lightweight student, and has demonstrated its unprecedented success over  ...  In spite of the encouraging results, the KD process per se poses a potential threat to network ownership protection, since the knowledge contained in network can be effortlessly distilled and hence exposed  ...  We compare the distillation performances from the normal and the knowl-edgeable teachers, as depicted in Table 2 .  ... 
arXiv:2112.03695v1 fatcat:kjpmosh4djhfnkgyqq3a3s5oqm

Towards the systematic design of actuation for process systems

A.E.M. Huesman, O.H. Bosgra, P.M.J. Van den Hof
2010 IFAC Proceedings Volumes  
This paper proposes geometry and flux equations as the required domain knowledge.  ...  Currently systematic design of actuation (operational degrees of freedom) for process systems is not possible because (i) the required domain knowledge has not been identified and (ii) it is unclear how  ...  For example a distillation column under total reflux is operated in a completely "closed" mode since there is no feed and product flow from and to the environment.  ... 
doi:10.3182/20100705-3-be-2011.00078 fatcat:5ouxi2ekqvh5nirtx2uhnx4xli

Attention Distillation for Learning Video Representations [article]

Miao Liu, Xin Chen, Yun Zhang, Yin Li, James M. Rehg
2020 arXiv   pre-print
Specifically, we propose to leverage output attention maps as a vehicle to transfer the learned representation from a motion (flow) network to an RGB network.  ...  We systematically study the design of attention modules, and develop a novel method for attention distillation.  ...  Distillation without Forgetting. Feature distillation might "overwrite" the features from RGB stream with the features from flow stream.  ... 
arXiv:1904.03249v2 fatcat:hq27jjsyang4rcvhrykznurpme

DMCL: Distillation Multiple Choice Learning for Multimodal Action Recognition [article]

Nuno C. Garcia, Sarah Adel Bargal, Vitaly Ablavsky, Pietro Morerio, Vittorio Murino, Stan Sclaroff
2019 arXiv   pre-print
We introduce a novel Distillation Multiple Choice Learning framework for multimodal data, where different modality networks learn in a cooperative setting from scratch, strengthening one another.  ...  Our goal is to leverage the complementary information of multiple modalities to the benefit of the ensemble and each individual network.  ...  petitive to or state-of-the-art results compared to the privileged information literature, and significantly higher accuracy compared to independently trained modality networks for human action recognition  ... 
arXiv:1912.10982v1 fatcat:aplmcrqnufai7mjrf4rgqzcw2u

A Novel Multi-Knowledge Distillation Approach

Lianqiang LI, Kangbo SUN, Jie ZHU
2021 IEICE transactions on information and systems  
This paper proposes a novel knowledge distillation approach called multi-knowledge distillation (MKD). MKD consists of two stages.  ...  Knowledge distillation approaches can transfer information from a large network (teacher network) to a small network (student network) to compress and accelerate deep neural networks.  ...  Knowledge Extraction MKD wants to distill the important information from the teacher network's FM to the student network.  ... 
doi:10.1587/transinf.2020edl8080 fatcat:mxlnixumwnbazgqjnmlfuwt6fi

Distilling Cross-Task Knowledge via Relationship Matching

Han-Jia Ye, Su Lu, De-Chuan Zhan
2020 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
) approach, which decomposes the knowledge distillation flow into branches for embedding and the top-layer classifier.  ...  The discriminative knowledge from a high-capacity deep neural network (a.k.a. the "teacher") could be distilled to facilitate the learning efficacy of a shallow counterpart (a.k.a. the "student").  ...  Acknowledgments This work is partially supported by The National Key R&D Program of China (2018YFB1004300), NSFC (61773198, 61773198, 61632004), and NSFC-NRF joint research project (61861146001).  ... 
doi:10.1109/cvpr42600.2020.01241 dblp:conf/cvpr/YeLZ20 fatcat:ztncbbl4xrfpbgm2sm5xdvcxty

Fast Video Salient Object Detection via Spatiotemporal Knowledge Distillation [article]

Yi Tang and Yuanman Li and Wenbin Zou
2021 arXiv   pre-print
from adjacent frames.  ...  In the temporal aspect, we propose a temporal knowledge distillation strategy, which allows the network to learn the robust temporal features through the infer-frame feature encoding and distilling information  ...  Figure 1 : 1 Saliency results from different approaches. Figure 2 : 2 A brief description of the traditional knowledge distillation and self-distillation.  ... 
arXiv:2010.10027v2 fatcat:rvdln7uz2rgb5ipvsswv7yntsi

Adversarial Optimization-Based Knowledge Transfer of Layer-Wise Dense Flow for Image Classification

Doyeob Yeo, Min-Suk Kim, Ji-Hoon Bae
2021 Applied Sciences  
We propose a semi-supervised learning-based knowledge transfer with multiple items of dense flow-based knowledge extracted from the pre-trained DNN.  ...  Knowledge distillation transferred to another target DNN based on adversarial loss functions has multiple flow-based knowledge items that are densely extracted by overlapping them from a pre-trained DNN  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
doi:10.3390/app11083720 fatcat:bkjhoyqzp5ftxp5xdsiino5yuy
« Previous Showing results 1 — 15 out of 93,785 results