A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Knowledge Distillation for Quality Estimation
[article]
2021
arXiv
pre-print
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online ...
Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios. ...
Introduction Quality Estimation (QE) aims to predict the quality of the output of Machine Translation (MT) systems when no gold-standard translations are available. ...
arXiv:2107.00411v1
fatcat:bkakpl4mwrcajnwyh5ghzw2t3m
Knowledge Distillation for Quality Estimation
2021
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021
unpublished
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online ...
Models trained on distilled pre-trained representations remain prohibitively large for many usage scenarios. ...
Introduction Quality Estimation (QE) aims to predict the quality of the output of Machine Translation (MT) systems when no gold-standard translations are available. ...
doi:10.18653/v1/2021.findings-acl.452
fatcat:3hr72dybb5b3ddjrpwkg65hp7m
Parser-Free Virtual Try-on via Distilling Appearance Flows
[article]
2021
arXiv
pre-print
"teacher knowledge", which is extracted from the real person images in a self-supervised way. (2) Other than using real images as supervisions, we formulate knowledge distillation in the try-on problem ...
as distilling the appearance flows between the person image and the garment image, enabling us to find accurate dense correspondences between them to produce high-quality results. (3) Extensive evaluations ...
Knowledge Distillation. ...
arXiv:2103.04559v2
fatcat:b65zvjkw7nckrcsk76uftl3siq
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
[article]
2021
arXiv
pre-print
We perform knowledge distillation from a GPT2-based context prediction network into a simple recurrent model by minimizing a teacher-student loss defined between the context embedding vectors of those ...
Although this method achieves comparable speech quality to that of a method that waits for the future context, it entails a huge amount of processing for sampling from the language model at each time step ...
CONCLUSIONS We proposed a knowledge distillation method for efficiently estimating contextual embedding while using the linguistic knowledge of a large pre-trained language model. ...
arXiv:2109.10724v1
fatcat:aps77vtzwbgh3a4pcgh7yo3qee
Speech Enhancement Using Generative Adversarial Network by Distilling Knowledge from Statistical Method
2019
Applied Sciences
Then, the discriminator network and generator network are re-trained by distilling knowledge from the statistical method, which is inspired by the knowledge distillation in a neural network. ...
This paper presents a new deep neural network (DNN)-based speech enhancement algorithm by integrating the distilled knowledge from the traditional statistical-based method. ...
Table 1 . 1 PESQ scores for speech quality test on the development set and evaluation set for each environment. ...
doi:10.3390/app9163396
fatcat:tgjqxpeb4zcqxdzdj7w2fyj73y
Online Knowledge Distillation for Efficient Pose Estimation
[article]
2022
arXiv
pre-print
One promising technique to obtain an accurate yet lightweight pose estimator is knowledge distillation, which distills the pose knowledge from a powerful teacher model to a less-parameterized student model ...
Existing state-of-the-art human pose estimation methods require heavy computational resources for accurate predictions. ...
Figure 2 . 2 An overview of the proposed Online Knowledge Distillation for Human Pose estimation (OKDHP). Each branch serves as an independent pose estimator. ...
arXiv:2108.02092v2
fatcat:gejqfkjltveytpxi6nid3ctzgm
Symbolic Knowledge Distillation: from General Language Models to Commonsense Models
[article]
2021
arXiv
pre-print
Empirical results demonstrate that, for the first time, a human-authored commonsense knowledge graph is surpassed by our automatically distilled variant in all three criteria: quantity, quality, and diversity ...
Our study leads to a new framework, Symbolic Knowledge Distillation. As with prior art in Knowledge Distillation (Hinton et al., 2015), our approach uses larger models to teach smaller models. ...
part by the Natural Sciences and Engineering Research Council of Canada (NSERC) (funding reference number 401233309), DARPA MCS program through NIWC Pacific (N66001-19-2-4031), and the Allen Institute for ...
arXiv:2110.07178v1
fatcat:5vubwnf6ybh3dbp7oefflcq7t4
Structured Knowledge Distillation for Dense Prediction
[article]
2020
arXiv
pre-print
Previous knowledge distillation strategies used for dense prediction tasks often directly borrow the distillation scheme for image classification and perform knowledge distillation for each pixel separately ...
The effectiveness of our knowledge distillation approaches is demonstrated by experiments on three dense prediction tasks: semantic segmentation, depth estimation and object detection. ...
Thus, we only use the structured knowledge distillation in depth estimation task. ...
arXiv:1903.04197v7
fatcat:vcwpcgffgndwfo3xye3kdjtm24
Knowledge Distillation: A Method for Making Neural Machine Translation More Efficient
2022
Information
We show that sequence-level knowledge distillation can be used to train small student models on knowledge distilled from large teacher models. ...
In this work, we investigate knowledge distillation on a simulated low-resource German-to-English translation task. ...
= k|x; θ). ( 2 ) We can now use L KD to define functions for knowledge distillation for NMT. ...
doi:10.3390/info13020088
fatcat:6wfjd3mrdrccrf3acx4oq4sz4y
Knowledge Distillation Leveraging Alternative Soft Targets from Non-Parallel Qualified Speech Data
[article]
2021
arXiv
pre-print
This paper describes a novel knowledge distillation framework that leverages acoustically qualified speech data included in an existing training data pool as privileged information. ...
In our proposed framework, a student network is trained with multiple soft targets for each utterance that consist of main soft targets from original speakers' utterance and alternative targets from other ...
In the previous literature, for knowledge distillation with privileged information, time-aligned same speakers' utterances are used to estimate q(i|x) and p(i|x). ...
arXiv:2112.08878v1
fatcat:23ess57tfrhwpiy4ny5qyh7654
Visual Relationship Detection Based on Guided Proposals and Semantic Knowledge Distillation
[article]
2018
arXiv
pre-print
For this, we propose a framework that makes use of semantic knowledge and estimates the relevance of object pairs during both training and test phases. ...
A 68.5% relative gain on the recall at 100 is directly related to the relevance estimate and a 32.7% gain to the knowledge distillation. ...
IK stands for internal knowledge distillation and SK for semantic knowledge distillation. R e denotes the datadriven relevance estimation (Section 3.1) and R p the relevance prediction (Section 3.2). ...
arXiv:1805.10802v1
fatcat:3a6cqivkanbirpssu6hkw32aca
Distill on the Go: Online knowledge distillation in self-supervised learning
[article]
2021
arXiv
pre-print
To address the issue of self-supervised pre-training of smaller models, we propose Distill-on-the-Go (DoGo), a self-supervised learning paradigm using single-stage online knowledge distillation to improve ...
the representation quality of the smaller models. ...
To address this issue and improve the representation quality of smaller models, we leverage knowledge distillation (KD) [20] . ...
arXiv:2104.09866v2
fatcat:vinxyk5pufcktdss2ehbtmugrm
X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task Distillation
[article]
2021
arXiv
pre-print
In this paper, we propose a novel method, X-Distill, to improve the self-supervised training of monocular depth via cross-task knowledge distillation from semantic segmentation to depth estimation. ...
In order to enable such knowledge distillation across two different visual tasks, we introduce a small, trainable network that translates the predicted depth map to a semantic segmentation map, which can ...
Knowledge Distillation: Knowledge Distillation is usually used to transfer the knowledge from a more complex model to a smaller model, where both of them are designed for the same visual task [12] . ...
arXiv:2110.12516v1
fatcat:vxiii3tna5bejfx4xmlabydcwe
Combining prior knowledge with data driven modeling of a batch distillation column including start-up
2003
Computers and Chemical Engineering
The quality is controlled with a PI controller in combination with a PLS estimator, which estimates the quality from online measurements. ...
Due to bottom exhaustion, for batch distillation this remains a semi-steady state. ...
. ; x n ) (10) with n inputs and m rules are represented by the following set of equations: in which m j ,i is the membership value for input i in rule j and a is the membership function parameter matrix ...
doi:10.1016/s0098-1354(03)00067-x
fatcat:aub4vbcj75hozpnj37tshaf3xm
Distill Knowledge from NRSfM for Weakly Supervised 3D Pose Learning
[article]
2019
arXiv
pre-print
We propose to learn a 3D pose estimator by distilling knowledge from Non-Rigid Structure from Motion (NRSfM). Our method uses solely 2D landmark annotations. ...
This alleviates the data bottleneck, which is one of the major concern for supervised methods. ...
Thereby we have the following definition of quality function for z , which we use as the loss function for knowledge distilling: L (i) (z ) = min ϕ∈S (i) (z ) C (i) (ϕ). (10) This computes the minimum ...
arXiv:1908.06377v1
fatcat:jbnz7jeszjcbpc2uj5ykee67ey
« Previous
Showing results 1 — 15 out of 90,674 results