no code implementations • 22 Nov 2023 • Chi-Ping Su, Ching-Hsun Tseng, Shin-Jye Lee
Knowledge Distillation (KD) transfers knowledge from a larger "teacher" model to a compact "student" model, guiding the student with the "dark knowledge" $\unicode{x2014}$ the implicit insights present in the teacher's soft predictions.