diff --git a/slides/regularization/slides-regu-l1l2-2.tex b/slides/regularization/slides-regu-l1l2-2.tex index a242935b..d9854306 100644 --- a/slides/regularization/slides-regu-l1l2-2.tex +++ b/slides/regularization/slides-regu-l1l2-2.tex @@ -70,12 +70,12 @@ \end{vbframe} \begin{vbframe}{$L1$ and $L2$ Reg. with Orthonormal Design} -For the special case of orthonormal design $\Xmat^{\top}\Xmat=\id$ we can derive closed-form a solution in terms of $\thetah_{\text{OLS}}=(\Xmat^{\top}\Xmat)^{-1}\Xmat^{\top}\yv=\Xmat^{\top}\yv$: +For special case of orthonormal design $\Xmat^{\top}\Xmat=\id$ we can derive closed-form a solution in terms of $\thetah_{\text{OLS}}=(\Xmat^{\top}\Xmat)^{-1}\Xmat^{\top}\yv=\Xmat^{\top}\yv$: $$\thetah_{\text{Lasso}}=\text{sign}(\thetah_{\text{OLS}})(\vert \thetah_{\text{OLS}} \vert - \lambda)_{+}\quad(\text{sparsity})$$ Function $S(\theta,\lambda):=\text{sign}(\theta)(|\theta|-\lambda)_{+}$ is called \textbf{soft thresholding} operator: For $|\theta|<\lambda$ it returns $0$, whereas params $|\theta|>\lambda$ are shrunken toward $0$ by $\lambda$.\\ \vspace{0.2cm} -Comparing this to $\thetah_{\text{Ridge}}$ under orthonormal design we see qualitatively different behavior as $\lambda \uparrow$: +Comparing this to $\thetah_{\text{Ridge}}$ under orthonormal design: %we see qualitatively different behavior as $\lambda \uparrow$: $$\thetah_{\text{Ridge}}= ({\Xmat}^T \Xmat + \lambda \id)^{-1} \Xmat^T\yv=((1+\lambda)\id)^{-1}\thetah_{\text{OLS}} = \frac{\thetah_{\text{OLS}}}{1+\lambda}$$ While soft threshold ensures exact zeros in solution, $L2$ penalty uniformly downscales parameters (no sparsity).