An Empirical Study of Iterative Knowledge Distillation for Neural Network Compression

Sharan Yalburgi, Tirtharaj Dash, Ramya Hebbalaguppe, Srinidhi Hegde, Ashwin Srinivasan
2020 The European Symposium on Artificial Neural Networks  
In this paper we introduce Iterative Knowledge Distillation (IKD), the process of successively minimizing models based on the Knowledge Distillation (KD) approach in [1] . We study two variations of IKD, called parental-and ancestral-training. Both use a single-teacher, and result in a single-student model: the differences arise from which model is used as a teacher. Our results provide support for the utility of the IKD procedure, in the form of increased model compression, without significant
more » ... losses in predictive accuracy. An important task in IKD is choosing the right model(s) to act as a teacher for a subsequent iteration. Across the variations of IKD studied, our results suggest that the most recent model constructed (parental-training) is the best single teacher for the model in the next iteration. This result suggests that training in IKD can proceed without requiring us to keep all models in the sequence.
dblp:conf/esann/YalburgiDHH020 fatcat:w3uqpwonvvak5p3wsbmoqejjgq