https://bradleyboehmke.github.io/HOML/mars.html
- lasso can be used to identify and extract those features with the largest (and most consistent) signal.
-
Group Lasso Group Lasso
- having a knit in lasso(which is equal to 0)
- For Ridge regression, the larger the lambda is, the slope(coefficient) will more close to 0
- suitable for the case that the number of predictors is larger than that of observations (p >> n)
- supervised learning considering the projection to latent structure
influential point for glm model
- minimize the
$\epsilon$ -insensitive loss function along with the$\frac{1}{2}w^Tw$ regularization term, where$|y_i-f(x_i)_\epsilon|=max(0, |y_i-f(x_i)|-\epsilon$ is the$epsilone$ -insensitive loss function
$min_{w, b}\frac{1}{2}w^Tw+C\Sigma_{i=1}^l(\kappa_i+\kappa_i^)$ $subject \ to$ $y_i-(A_iw+b)<=\epsilon+\kappa_i$ $(A_iw+b)-y_i<=\epsilon+\kappa_i^, \ \ \kappa_i, \kappa_i^*>=0$
- minimize the quadratic loss function along with the
$\frac{1}{2}w^Tw$ regularization term
- use huber loss (改善Squared loss function 對outlier的robustness)
- SGDClassifier
- classification loss function: hinge, log, modified_huber, squared_hinge
- regression loss function: squared_error, huber, epsilon_insensitive
- log loss: give logistic regression
- modified_huber: smooth loss that brings to tolerance to outliers as well as prbability estimates
- TSVR estimates two non-parallel hyperplanes by solving two quadratic progeamming problems (QPP)
- 針對 easy example進行down-weighting,因為focal loss希望在訓練過程中盡量去訓練hard example