Distillation for High-Quality Knowledge Extraction via Explainable Oracle Approach


Recent successes suggest that knowledge distillation techniques can usefully transfer knowledge between deep neural networks as compression and acceleration techniques, e.g., effectively and reliably compress a large teacher model into a smaller student model with limited resources. However, knowledge distillation performance is degraded when the model compression rate becomes excessively high due to the size of the teacher model. To address this, we advocate for improving the teacher-to-student knowledge transfer by identifying and reinforcing input-level signals of substantial contributions for a final verdict, e.g., signals for a long trunk of elephants are strengthened and transferred to the student model. To this end, we adopt gradient-based explainable AI techniques for extracting output-relevant input-level features. Then, we strengthen and transfer these signals to improve the knowledge distillation performance. Our experiments on public datasets (i.e., CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet) show that our method clearly outperforms existing knowledge distillation approaches, especially in the case of using a small teacher model. Our code is available at https://github.com/myunghakLee/Distillation-for-High-Quality-Knowledge-Extraction

In British Machine Vision Conference 2023