diff --git a/slides/advriskmin/figure/telephone-data.pdf b/slides/advriskmin/figure/telephone-data.pdf index 15054624..4df703de 100644 Binary files a/slides/advriskmin/figure/telephone-data.pdf and b/slides/advriskmin/figure/telephone-data.pdf differ diff --git a/slides/advriskmin/rsrc/telephone-data.R b/slides/advriskmin/rsrc/telephone-data.R index 3e6da80e..6123eada 100644 --- a/slides/advriskmin/rsrc/telephone-data.R +++ b/slides/advriskmin/rsrc/telephone-data.R @@ -74,7 +74,7 @@ res.tel$pred_logcosh <- preds.logcosh p <- ggplot(res.tel, aes(x = year)) + geom_line(aes(y = pred_l2, color = "L2 (OLS)"), size = 1.6, alpha=1) + #geom_line(aes(y = pred_l1, color = "L1"), size = 1.4) + - geom_line(aes(y = pred_l1_manual, color = "L1 (man)"), size = 1.6, alpha=1) + + geom_line(aes(y = pred_l1_manual, color = "L1"), size = 1.6, alpha=1) + geom_line(aes(y = pred_huber, color = "Huber"), size = 1.6, alpha=1) + geom_line(aes(y = pred_logcosh, color = "Log-Cosh"), size = 1.6, alpha=1) + geom_point(aes(y = calls), color = "black", size = 4, alpha=1) + diff --git a/slides/advriskmin/slides-advriskmin-proper-scoring-rules.tex b/slides/advriskmin/slides-advriskmin-proper-scoring-rules.tex index 3f84dff6..7ec1e0cc 100644 --- a/slides/advriskmin/slides-advriskmin-proper-scoring-rules.tex +++ b/slides/advriskmin/slides-advriskmin-proper-scoring-rules.tex @@ -105,38 +105,42 @@ \begin{vbframe}{Binary classification scores} -To find strictly proper scores, we can ask: Which functions have the property such that $\E_y[S(\pi,y)]$ is maximized at $\pi=p$? We have +To find strictly proper scores/losses, we can ask: Which functions have the property such that $\E_y[L(y,\pi)]$ is minimized at $\pi=p$? We have -$$\E_{y}[S(\pi, y)]=p \cdot S(\pi, 1) + (1-p) \cdot S(\pi, 0)$$ +$$\E_{y}[L(y,\pi)]=p \cdot L(1,\pi) + (1-p) \cdot L(0,\pi)$$ -Let's further restrict our search to scores $S(\pi,y)$ for which $S(\pi,1)=S(\pi)$ and $S(\pi, 0)=S(1-\pi)$: +Let's further %restrict our search to scores +assume that $L(1,\pi)$ and $L(0, \pi)$ can not be arbitrary, but are the same function evaluated at $\pi$ and $1-\pi$: +%$S(\pi,y)$ for which +$L(1,\pi)=L(\pi)$ and $L(0,\pi)=L(1-\pi)$. Then -$$\E_{y}[S(\pi, y)]=p \cdot S(\pi) + (1-p) \cdot S(1-\pi)$$ +$$\E_{y}[L(y,\pi)]=p \cdot L(\pi) + (1-p) \cdot L(1-\pi)$$ \vspace{0.2cm} -Setting the derivative w.r.t. $\pi$ to $0$ and requiring $\pi=p$ at the optimum, we get the following first-order condition: +Setting the derivative w.r.t. $\pi$ to $0$ and requiring $\pi=p$ at the optimum (\textbf{propriety}), we get the following first-order condition (F.O.C.): \vspace{0.3cm} -$$p \cdot S'(p) \overset{!}{=} (1-p) \cdot S'(1-p)$$ +$$p \cdot L'(p) \overset{!}{=} (1-p) \cdot L'(1-p)$$ \framebreak \begin{itemize}\setlength\itemsep{1.9em} - \item F.O.C.:\quad $p \cdot S'(p) \overset{!}{=} (1-p) \cdot S'(1-p)$ - \item One natural solution is $S'(p)=1/p$, resulting in $p/p=(1-p)/(1-p)=1$ and $S(p)=\log(p)$. - \item This is the \textbf{logarithmic scoring rule} $S(\pi,y)=y \cdot \log(\pi) + (1-y) \cdot \log(1-\pi)$ - \item Under (loss) minimization this corresponds to the \textbf{log loss} $L(y,\pi)=-(y \cdot \log(\pi) + (1-y) \cdot \log(1-\pi))$ + \item F.O.C.:\quad $p \cdot L'(p) \overset{!}{=} (1-p) \cdot L'(1-p)$ + \item One natural solution is $L'(p)=-1/p$, resulting in $-p/p=-(1-p)/(1-p)=-1$ and the antiderivative $L(p)=-\log(p)$. + \item This is the \textbf{log loss} $$L(y,\pi)=-(y \cdot \log(\pi) + (1-y) \cdot \log(1-\pi))$$ + \item The corresponding scoring rule (maximization) is the strictly proper \textbf{logarithmic scoring rule} $$S(\pi,y)=y \cdot \log(\pi) + (1-y) \cdot \log(1-\pi)$$ \end{itemize} \framebreak -\begin{itemize} \setlength\itemsep{1.9em} - \item F.O.C.:\quad $p \cdot S'(p) \overset{!}{=} (1-p) \cdot S'(1-p)$ - \item A second solution is $S'(p)=2(1-p)$, resulting in $2p(1-p)=2(1-p)p$ and $S(p)=-(1-p)^2=-\frac{1}{2}((1-p)^2+(0-(1-p))^2)$ - \item This gives rise to the \textbf{quadratic scoring rule}, which for two classes is $S(\pi,y)=-\frac{1}{2} \sum_{i=1}^{2}(y_i-\pi_i)^2$\\ {\small (with $y_1=y, y_2=1-y$ and likewise $\pi_1=\pi, \pi_2=1-\pi$)} - \item Its positive counterpart is also called the \textbf{Brier score} (minimization) and is effectively the \textbf{MSE loss} for probabilities +\begin{itemize} \setlength\itemsep{1.2em} + \item F.O.C.:\quad $p \cdot L'(p) \overset{!}{=} (1-p) \cdot L'(1-p)$ + \item A second solution is $L'(p)=-2(1-p)$, resulting in $-2p(1-p)=-2(1-p)p$ and the antiderivative $L(p)=(1-p)^2=\frac{1}{2}((1-p)^2+(0-(1-p))^2)$ + \item This is also called the \textbf{Brier score} and is effectively the \textbf{MSE loss} for probabilities $$L(y,\pi)=\frac{1}{2}\sum_{i=1}^{2}(y_i-\pi_i)^2$$ + {\small (with $y_1=y, y_2=1-y$ and likewise $\pi_1=\pi, \pi_2=1-\pi$)} + \item Using positive orientation (maximization), this gives rise to the \textbf{quadratic scoring rule}, which for two classes is $S(\pi,y)=-\frac{1}{2} \sum_{i=1}^{2}(y_i-\pi_i)^2$ \end{itemize} \end{vbframe} diff --git a/slides/advriskmin/slides-advriskmin-regression-further-losses.tex b/slides/advriskmin/slides-advriskmin-regression-further-losses.tex index b62aae40..93348990 100644 --- a/slides/advriskmin/slides-advriskmin-regression-further-losses.tex +++ b/slides/advriskmin/slides-advriskmin-regression-further-losses.tex @@ -33,7 +33,7 @@ \begin{vbframe}{Advanced Loss Functions} Special loss functions can be used to estimate non-standard posterior components, - to measure errors in a custom way or are designed to have special properties like robustness. + to measure errors customarily or which are designed to have special properties like robustness. \vspace{1cm} @@ -226,7 +226,7 @@ \end{vbframe} -\begin{comment} +%\begin{comment} \begin{vbframe}{Log-Barrier Loss} \begin{small} @@ -239,9 +239,9 @@ \end{small} \begin{itemize} -\item Behaves like $L2$ loss for small residuals. -\item We use this if we don't want residuals larger than $\epsilon$ at all. -\item No guarantee that the risk minimization problem has a solution. +\item Behaves like $L2$ loss for small residuals +\item We use this if we don't want residuals larger than $\epsilon$ at all +\item No guarantee that the risk minimization problem has a solution \item Plot shows log-barrier loss for $\epsilon=2$: \end{itemize} @@ -253,38 +253,6 @@ \end{vbframe} - - -\begin{vbframe}{Log-Barrier loss} - -\begin{itemize} - % \item Similarly to the Huber loss, there is no closed-form solution for the optimal constant model $f = \thetab$ w.r.t. the log-barrier loss. Numerical optimization is necessary. - \item Note that the optimization problem has no (finite) solution if there is no way to fit a constant where all residuals are smaller than $\epsilon$. -\end{itemize} - -\vspace{0.2cm} - -\begin{center} -% \includegraphics[width = 9cm ]{figure_man/log_barrier2.png} -\includegraphics[width = \textwidth]{figure/loss_logbarrier_2.png} -\end{center} - -% \framebreak -% -% -% Note that the optimization problem has no (finite) solution if there is no way to fit a constant where all residuals are smaller than $a$. -% -% \vspace{- 0.2cm} -% -% -% \begin{center} -% \includegraphics[width = 0.8\textwidth]{figure_man/log_barrier_2_1.png} \\ -% \end{center} - -% We see that the constant model fitted w.r.t. Huber loss in fact lies between L1- and L2-Loss. - -\end{vbframe} - \begin{vbframe}{$\eps$-Insensitive loss} \vspace*{-0.3cm} @@ -295,9 +263,9 @@ \end{cases}, \quad \epsilon \in \R_{+} $$ \begin{itemize} -\item Modification of $L1$ loss, errors below $\epsilon$ accepted without penalty. -\item Used in SVM regression. -\item Properties: convex and not differentiable for $ y - f \in \{-\epsilon, \epsilon\}$. +\item Modification of $L1$ loss, errors below $\epsilon$ accepted without penalty +\item Used in SVM regression +\item Properties: convex and not differentiable for $ y - f \in \{-\epsilon, \epsilon\}$ \end{itemize} \vfill @@ -308,7 +276,7 @@ \end{center} \end{vbframe} -\end{comment} +%\end{comment} %% BB: der beweis hier sieht falsch aus...! %% \begin{vbframe}{$\epsilon$-insensitive Loss: Optimal Constant} @@ -354,10 +322,10 @@ \normalsize \begin{itemize} \item Extension of $L1$ loss (equal to $L1$ for $\alpha = 0.5$). -\item Weights either positive or negative residuals more strongly. +\item Weighs either positive or negative residuals more strongly \item $\alpha<0.5$ $(\alpha>0.5)$ penalty to over-estimation (under-estimation) \item Risk minimizer is (conditional) - $\alpha$-quantile (median for $\alpha=0.5$). + $\alpha$-quantile (median for $\alpha=0.5$) \end{itemize} \vfill