Knowledge Distillation (KD) aims to achieve mask-invariant feature vectors so that the model focuses on non-occluded regions of the face. Our approach accomplishes this by learning in concert the correct expression recognition for masked and non-masked faces and how to push the embedding vectors of masked images and corresponding non-masked images closer. It does this through embedding-level KD. KD teaches the student model to neglect non-expression related information introduced by the mask by making the student model process masked images in a manner that produces an embedding like the non-masked embedding produced by the teacher model.
Without KD |
With KD |