A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2022; you can also visit the original URL.
The file type is
In this paper we introduce Iterative Knowledge Distillation (IKD), the process of successively minimizing models based on the Knowledge Distillation (KD) approach in  . We study two variations of IKD, called parental-and ancestral-training. Both use a single-teacher, and result in a single-student model: the differences arise from which model is used as a teacher. Our results provide support for the utility of the IKD procedure, in the form of increased model compression, without significantdblp:conf/esann/YalburgiDHH020 fatcat:w3uqpwonvvak5p3wsbmoqejjgq