Filters








90,674 Hits in 4.3 sec

Knowledge Distillation for Quality Estimation [article]

Amit Gajbhiye, Marina Fomicheva, Fernando Alva-Manchego, Frédéric Blain, Abiola Obamuyide, Nikolaos Aletras, Lucia Specia
2021 arXiv   pre-print
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online  ...  Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios.  ...  Introduction Quality Estimation (QE) aims to predict the quality of the output of Machine Translation (MT) systems when no gold-standard translations are available.  ... 
arXiv:2107.00411v1 fatcat:bkakpl4mwrcajnwyh5ghzw2t3m

Knowledge Distillation for Quality Estimation

Amit Gajbhiye, Marina Fomicheva, Fernando Alva-Manchego, Frédéric Blain, Abiola Obamuyide, Nikolaos Aletras, Lucia Specia
2021 Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021   unpublished
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online  ...  Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios.  ...  Introduction Quality Estimation (QE) aims to predict the quality of the output of Machine Translation (MT) systems when no gold-standard translations are available.  ... 
doi:10.18653/v1/2021.findings-acl.452 fatcat:3hr72dybb5b3ddjrpwkg65hp7m

Parser-Free Virtual Try-on via Distilling Appearance Flows [article]

Yuying Ge, Yibing Song, Ruimao Zhang, Chongjian Ge, Wei Liu, Ping Luo
2021 arXiv   pre-print
"teacher knowledge", which is extracted from the real person images in a self-supervised way. (2) Other than using real images as supervisions, we formulate knowledge distillation in the try-on problem  ...  as distilling the appearance flows between the person image and the garment image, enabling us to find accurate dense correspondences between them to produce high-quality results. (3) Extensive evaluations  ...  Knowledge Distillation.  ... 
arXiv:2103.04559v2 fatcat:b65zvjkw7nckrcsk76uftl3siq

Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network [article]

Takaaki Saeki, Shinnosuke Takamichi, Hiroshi Saruwatari
2021 arXiv   pre-print
We perform knowledge distillation from a GPT2-based context prediction network into a simple recurrent model by minimizing a teacher-student loss defined between the context embedding vectors of those  ...  Although this method achieves comparable speech quality to that of a method that waits for the future context, it entails a huge amount of processing for sampling from the language model at each time step  ...  CONCLUSIONS We proposed a knowledge distillation method for efficiently estimating contextual embedding while using the linguistic knowledge of a large pre-trained language model.  ... 
arXiv:2109.10724v1 fatcat:aps77vtzwbgh3a4pcgh7yo3qee

Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method

Jianfeng Wu, Yongzhu Hua, Shengying Yang, Hongshuai Qin, Huibin Qin
2019 Applied Sciences  
Then, the discriminator network and generator network are re-trained by distilling knowledge from the statistical method, which is inspired by the knowledge distillation in a neural network.  ...  This paper presents a new deep neural network (DNN)-based speech enhancement algorithm by integrating the distilled knowledge from the traditional statistical-based method.  ...  Table 1 . 1 PESQ scores for speech quality test on the development set and evaluation set for each environment.  ... 
doi:10.3390/app9163396 fatcat:tgjqxpeb4zcqxdzdj7w2fyj73y

Online Knowledge Distillation for Efficient Pose Estimation [article]

Zheng Li, Jingwen Ye, Mingli Song, Ying Huang, Zhigeng Pan
2022 arXiv   pre-print
One promising technique to obtain an accurate yet lightweight pose estimator is knowledge distillation, which distills the pose knowledge from a powerful teacher model to a less-parameterized student model  ...  Existing state-of-the-art human pose estimation methods require heavy computational resources for accurate predictions.  ...  Figure 2 . 2 An overview of the proposed Online Knowledge Distillation for Human Pose estimation (OKDHP). Each branch serves as an independent pose estimator.  ... 
arXiv:2108.02092v2 fatcat:gejqfkjltveytpxi6nid3ctzgm

Symbolic Knowledge Distillation: from General Language Models to Commonsense Models [article]

Peter West, Chandra Bhagavatula, Jack Hessel, Jena D. Hwang, Liwei Jiang, Ronan Le Bras, Ximing Lu, Sean Welleck, Yejin Choi
2021 arXiv   pre-print
Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity  ...  Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models.  ...  part by the Natural Sciences and Engineering Research Council of Canada (NSERC) (funding reference number 401233309), DARPA MCS program through NIWC Pacific (N66001-19-2-4031), and the Allen Institute for  ... 
arXiv:2110.07178v1 fatcat:5vubwnf6ybh3dbp7oefflcq7t4

Structured Knowledge Distillation for Dense Prediction [article]

Yifan Liu, Changyong Shun, Jingdong Wang, Chunhua Shen
2020 arXiv   pre-print
Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately  ...  The effectiveness of our knowledge distillation approaches is demonstrated by experiments on three dense prediction tasks: semantic segmentation, depth estimation and object detection.  ...  Thus, we only use the structured knowledge distillation in depth estimation task.  ... 
arXiv:1903.04197v7 fatcat:vcwpcgffgndwfo3xye3kdjtm24

Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient

Wandri Jooste, Rejwanul Haque, Andy Way
2022 Information  
We show that sequence-level knowledge distillation can be used to train small student models on knowledge distilled from large teacher models.  ...  In this work, we investigate knowledge distillation on a simulated low-resource German-to-English translation task.  ...  = k|x; θ). ( 2 ) We can now use L KD to define functions for knowledge distillation for NMT.  ... 
doi:10.3390/info13020088 fatcat:6wfjd3mrdrccrf3acx4oq4sz4y

Knowledge Distillation Leveraging Alternative Soft Targets from Non-Parallel Qualified Speech Data [article]

Tohru Nagano, Takashi Fukuda, Gakuto Kurata
2021 arXiv   pre-print
This paper describes a novel knowledge distillation framework that leverages acoustically qualified speech data included in an existing training data pool as privileged information.  ...  In our proposed framework, a student network is trained with multiple soft targets for each utterance that consist of main soft targets from original speakers' utterance and alternative targets from other  ...  In the previous literature, for knowledge distillation with privileged information, time-aligned same speakers' utterances are used to estimate q(i|x) and p(i|x).  ... 
arXiv:2112.08878v1 fatcat:23ess57tfrhwpiy4ny5qyh7654

Visual Relationship Detection Based on Guided Proposals and Semantic Knowledge Distillation [article]

François Plesse, Alexandru Ginsca, Bertrand Delezoide, Françoise Prêteux
2018 arXiv   pre-print
For this, we propose a framework that makes use of semantic knowledge and estimates the relevance of object pairs during both training and test phases.  ...  A 68.5% relative gain on the recall at 100 is directly related to the relevance estimate and a 32.7% gain to the knowledge distillation.  ...  IK stands for internal knowledge distillation and SK for semantic knowledge distillation. R e denotes the datadriven relevance estimation (Section 3.1) and R p the relevance prediction (Section 3.2).  ... 
arXiv:1805.10802v1 fatcat:3a6cqivkanbirpssu6hkw32aca

Distill on the Go: Online knowledge distillation in self-supervised learning [article]

Prashant Bhat, Elahe Arani, Bahram Zonooz
2021 arXiv   pre-print
To address the issue of self-supervised pre-training of smaller models, we propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation to improve  ...  the representation quality of the smaller models.  ...  To address this issue and improve the representation quality of smaller models, we leverage knowledge distillation (KD) [20] .  ... 
arXiv:2104.09866v2 fatcat:vinxyk5pufcktdss2ehbtmugrm

X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation [article]

Hong Cai, Janarbek Matai, Shubhankar Borse, Yizhe Zhang, Amin Ansari, Fatih Porikli
2021 arXiv   pre-print
In this paper, we propose a novel method, X-Distill, to improve the self-supervised training of monocular depth via cross-task knowledge distillation from semantic segmentation to depth estimation.  ...  In order to enable such knowledge distillation across two different visual tasks, we introduce a small, trainable network that translates the predicted depth map to a semantic segmentation map, which can  ...  Knowledge Distillation: Knowledge Distillation is usually used to transfer the knowledge from a more complex model to a smaller model, where both of them are designed for the same visual task [12] .  ... 
arXiv:2110.12516v1 fatcat:vxiii3tna5bejfx4xmlabydcwe

Combining prior knowledge with data driven modeling of a batch distillation column including start-up

Pascal F van Lith, Ben H.L Betlem, Brian Roffel
2003 Computers and Chemical Engineering  
The quality is controlled with a PI controller in combination with a PLS estimator, which estimates the quality from online measurements.  ...  Due to bottom exhaustion, for batch distillation this remains a semi-steady state.  ...  . ; x n ) (10) with n inputs and m rules are represented by the following set of equations: in which m j ,i is the membership value for input i in rule j and a is the membership function parameter matrix  ... 
doi:10.1016/s0098-1354(03)00067-x fatcat:aub4vbcj75hozpnj37tshaf3xm

Distill Knowledge from NRSfM for Weakly Supervised 3D Pose Learning [article]

Chaoyang Wang, Chen Kong, Simon Lucey
2019 arXiv   pre-print
We propose to learn a 3D pose estimator by distilling knowledge from Non-Rigid Structure from Motion (NRSfM). Our method uses solely 2D landmark annotations.  ...  This alleviates the data bottleneck, which is one of the major concern for supervised methods.  ...  Thereby we have the following definition of quality function for z , which we use as the loss function for knowledge distilling: L (i) (z ) = min ϕ∈S (i) (z ) C (i) (ϕ). (10) This computes the minimum  ... 
arXiv:1908.06377v1 fatcat:jbnz7jeszjcbpc2uj5ykee67ey
« Previous Showing results 1 — 15 out of 90,674 results