- Author: Duong T.
- Created at: Jan 21st, 2023
- Source code
While PCA is the most popular alogorithm which used for dimension reduction, but this algorithm has a disadvantage: it is for unspervised learning. Take a look at this picture: Source: https://machinelearningcoban.com/2017/06/30/lda/
With PCA, it doesn't see the colors (reds and blues) of samples, which means all of samples are the same, and PCA will see that the best component is
But the new projected samples on
LDA was born will solve this problem. It is a supervised learning, which the labels (y) affects the result.
Given that you have a dataset
Suppose that
Calculate "within-class" scatter matrix of each class, a sum up all of them:
Calculate "between-class" scatter matrix:
For short, we need to maximize
Where
If you want to know more about mathematics behind, you should take a look at references.
https://machinelearningcoban.com/2017/06/30/lda/
https://en.wikipedia.org/wiki/Linear_discriminant_analysis
https://towardsdatascience.com/linear-discriminant-analysis-explained-f88be6c1e00b