This repository includes some detailed proofs of "Bias Variance Decomposition for KL Divergence". Hopefully, it will be helpful for a better understanding of Heskes's paper and recent ICLR paper "Rethinking Soft Labels for Knowledge Distillation: A Bias-Variance Tradeoff Perspective". It also serves as a brief answer to the Exercise 7.3 in EE2211.
Please feel free to correct me if my understanding of any of the aspects was wrong.
- Briefly answered in math.stackexchange.com
- Paper: Information-Theoretic Variable Selection and Network Inference from Microarray Data
- Paper: Bias/variance decompositions for likelihood-based estimators
- Book: Notes for EE2211 Introduction to Machine Learning
If you find this repo useful, please cite
@misc{proof4biasvariance,
author = {Shuan},
title = {Bias-Variance-Decomposition-for-KL-Divergence},
howpublished = {\url{https://github.com/HolmesShuan/Bias-Variance-Decomposition-for-KL-Divergence}},
year = {2021}
}