Deep Learning Featured Distilling knowledge from Neural Networks to build smaller and faster models This article discusses GPT-2 and BERT models, as well using knowledge distillation to create highly accurate models with fewer parameters than their teachers