Skip to content

Commit

Permalink
updates for proper scoring rules and telephone data plot legend fix
Browse files Browse the repository at this point in the history
  • Loading branch information
ludwigbothmann committed Aug 30, 2024
1 parent f331bae commit d3e5627
Show file tree
Hide file tree
Showing 4 changed files with 31 additions and 59 deletions.
Binary file modified slides/advriskmin/figure/telephone-data.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion slides/advriskmin/rsrc/telephone-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,7 @@ res.tel$pred_logcosh <- preds.logcosh
p <- ggplot(res.tel, aes(x = year)) +
geom_line(aes(y = pred_l2, color = "L2 (OLS)"), size = 1.6, alpha=1) +
#geom_line(aes(y = pred_l1, color = "L1"), size = 1.4) +
geom_line(aes(y = pred_l1_manual, color = "L1 (man)"), size = 1.6, alpha=1) +
geom_line(aes(y = pred_l1_manual, color = "L1"), size = 1.6, alpha=1) +
geom_line(aes(y = pred_huber, color = "Huber"), size = 1.6, alpha=1) +
geom_line(aes(y = pred_logcosh, color = "Log-Cosh"), size = 1.6, alpha=1) +
geom_point(aes(y = calls), color = "black", size = 4, alpha=1) +
Expand Down
34 changes: 19 additions & 15 deletions slides/advriskmin/slides-advriskmin-proper-scoring-rules.tex
Original file line number Diff line number Diff line change
Expand Up @@ -105,38 +105,42 @@

\begin{vbframe}{Binary classification scores}

To find strictly proper scores, we can ask: Which functions have the property such that $\E_y[S(\pi,y)]$ is maximized at $\pi=p$? We have
To find strictly proper scores/losses, we can ask: Which functions have the property such that $\E_y[L(y,\pi)]$ is minimized at $\pi=p$? We have

$$\E_{y}[S(\pi, y)]=p \cdot S(\pi, 1) + (1-p) \cdot S(\pi, 0)$$
$$\E_{y}[L(y,\pi)]=p \cdot L(1,\pi) + (1-p) \cdot L(0,\pi)$$

Let's further restrict our search to scores $S(\pi,y)$ for which $S(\pi,1)=S(\pi)$ and $S(\pi, 0)=S(1-\pi)$:
Let's further %restrict our search to scores
assume that $L(1,\pi)$ and $L(0, \pi)$ can not be arbitrary, but are the same function evaluated at $\pi$ and $1-\pi$:
%$S(\pi,y)$ for which
$L(1,\pi)=L(\pi)$ and $L(0,\pi)=L(1-\pi)$. Then

$$\E_{y}[S(\pi, y)]=p \cdot S(\pi) + (1-p) \cdot S(1-\pi)$$
$$\E_{y}[L(y,\pi)]=p \cdot L(\pi) + (1-p) \cdot L(1-\pi)$$

\vspace{0.2cm}

Setting the derivative w.r.t. $\pi$ to $0$ and requiring $\pi=p$ at the optimum, we get the following first-order condition:
Setting the derivative w.r.t. $\pi$ to $0$ and requiring $\pi=p$ at the optimum (\textbf{propriety}), we get the following first-order condition (F.O.C.):

\vspace{0.3cm}

$$p \cdot S'(p) \overset{!}{=} (1-p) \cdot S'(1-p)$$
$$p \cdot L'(p) \overset{!}{=} (1-p) \cdot L'(1-p)$$

\framebreak

\begin{itemize}\setlength\itemsep{1.9em}
\item F.O.C.:\quad $p \cdot S'(p) \overset{!}{=} (1-p) \cdot S'(1-p)$
\item One natural solution is $S'(p)=1/p$, resulting in $p/p=(1-p)/(1-p)=1$ and $S(p)=\log(p)$.
\item This is the \textbf{logarithmic scoring rule} $S(\pi,y)=y \cdot \log(\pi) + (1-y) \cdot \log(1-\pi)$
\item Under (loss) minimization this corresponds to the \textbf{log loss} $L(y,\pi)=-(y \cdot \log(\pi) + (1-y) \cdot \log(1-\pi))$
\item F.O.C.:\quad $p \cdot L'(p) \overset{!}{=} (1-p) \cdot L'(1-p)$
\item One natural solution is $L'(p)=-1/p$, resulting in $-p/p=-(1-p)/(1-p)=-1$ and the antiderivative $L(p)=-\log(p)$.
\item This is the \textbf{log loss} $$L(y,\pi)=-(y \cdot \log(\pi) + (1-y) \cdot \log(1-\pi))$$
\item The corresponding scoring rule (maximization) is the strictly proper \textbf{logarithmic scoring rule} $$S(\pi,y)=y \cdot \log(\pi) + (1-y) \cdot \log(1-\pi)$$
\end{itemize}

\framebreak

\begin{itemize} \setlength\itemsep{1.9em}
\item F.O.C.:\quad $p \cdot S'(p) \overset{!}{=} (1-p) \cdot S'(1-p)$
\item A second solution is $S'(p)=2(1-p)$, resulting in $2p(1-p)=2(1-p)p$ and $S(p)=-(1-p)^2=-\frac{1}{2}((1-p)^2+(0-(1-p))^2)$
\item This gives rise to the \textbf{quadratic scoring rule}, which for two classes is $S(\pi,y)=-\frac{1}{2} \sum_{i=1}^{2}(y_i-\pi_i)^2$\\ {\small (with $y_1=y, y_2=1-y$ and likewise $\pi_1=\pi, \pi_2=1-\pi$)}
\item Its positive counterpart is also called the \textbf{Brier score} (minimization) and is effectively the \textbf{MSE loss} for probabilities
\begin{itemize} \setlength\itemsep{1.2em}
\item F.O.C.:\quad $p \cdot L'(p) \overset{!}{=} (1-p) \cdot L'(1-p)$
\item A second solution is $L'(p)=-2(1-p)$, resulting in $-2p(1-p)=-2(1-p)p$ and the antiderivative $L(p)=(1-p)^2=\frac{1}{2}((1-p)^2+(0-(1-p))^2)$
\item This is also called the \textbf{Brier score} and is effectively the \textbf{MSE loss} for probabilities $$L(y,\pi)=\frac{1}{2}\sum_{i=1}^{2}(y_i-\pi_i)^2$$
{\small (with $y_1=y, y_2=1-y$ and likewise $\pi_1=\pi, \pi_2=1-\pi$)}
\item Using positive orientation (maximization), this gives rise to the \textbf{quadratic scoring rule}, which for two classes is $S(\pi,y)=-\frac{1}{2} \sum_{i=1}^{2}(y_i-\pi_i)^2$
\end{itemize}

\end{vbframe}
Expand Down
54 changes: 11 additions & 43 deletions slides/advriskmin/slides-advriskmin-regression-further-losses.tex
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@

\begin{vbframe}{Advanced Loss Functions}
Special loss functions can be used to estimate non-standard posterior components,
to measure errors in a custom way or are designed to have special properties like robustness.
to measure errors customarily or which are designed to have special properties like robustness.

\vspace{1cm}

Expand Down Expand Up @@ -226,7 +226,7 @@

\end{vbframe}

\begin{comment}
%\begin{comment}
\begin{vbframe}{Log-Barrier Loss}

\begin{small}
Expand All @@ -239,9 +239,9 @@
\end{small}

\begin{itemize}
\item Behaves like $L2$ loss for small residuals.
\item We use this if we don't want residuals larger than $\epsilon$ at all.
\item No guarantee that the risk minimization problem has a solution.
\item Behaves like $L2$ loss for small residuals
\item We use this if we don't want residuals larger than $\epsilon$ at all
\item No guarantee that the risk minimization problem has a solution
\item Plot shows log-barrier loss for $\epsilon=2$:
\end{itemize}

Expand All @@ -253,38 +253,6 @@

\end{vbframe}



\begin{vbframe}{Log-Barrier loss}

\begin{itemize}
% \item Similarly to the Huber loss, there is no closed-form solution for the optimal constant model $f = \thetab$ w.r.t. the log-barrier loss. Numerical optimization is necessary.
\item Note that the optimization problem has no (finite) solution if there is no way to fit a constant where all residuals are smaller than $\epsilon$.
\end{itemize}

\vspace{0.2cm}

\begin{center}
% \includegraphics[width = 9cm ]{figure_man/log_barrier2.png}
\includegraphics[width = \textwidth]{figure/loss_logbarrier_2.png}
\end{center}

% \framebreak
%
%
% Note that the optimization problem has no (finite) solution if there is no way to fit a constant where all residuals are smaller than $a$.
%
% \vspace{- 0.2cm}
%
%
% \begin{center}
% \includegraphics[width = 0.8\textwidth]{figure_man/log_barrier_2_1.png} \\
% \end{center}

% We see that the constant model fitted w.r.t. Huber loss in fact lies between L1- and L2-Loss.

\end{vbframe}

\begin{vbframe}{$\eps$-Insensitive loss}

\vspace*{-0.3cm}
Expand All @@ -295,9 +263,9 @@
\end{cases}, \quad \epsilon \in \R_{+}
$$
\begin{itemize}
\item Modification of $L1$ loss, errors below $\epsilon$ accepted without penalty.
\item Used in SVM regression.
\item Properties: convex and not differentiable for $ y - f \in \{-\epsilon, \epsilon\}$.
\item Modification of $L1$ loss, errors below $\epsilon$ accepted without penalty
\item Used in SVM regression
\item Properties: convex and not differentiable for $ y - f \in \{-\epsilon, \epsilon\}$
\end{itemize}

\vfill
Expand All @@ -308,7 +276,7 @@
\end{center}

\end{vbframe}
\end{comment}
%\end{comment}

%% BB: der beweis hier sieht falsch aus...!
%% \begin{vbframe}{$\epsilon$-insensitive Loss: Optimal Constant}
Expand Down Expand Up @@ -354,10 +322,10 @@
\normalsize
\begin{itemize}
\item Extension of $L1$ loss (equal to $L1$ for $\alpha = 0.5$).
\item Weights either positive or negative residuals more strongly.
\item Weighs either positive or negative residuals more strongly
\item $\alpha<0.5$ $(\alpha>0.5)$ penalty to over-estimation (under-estimation)
\item Risk minimizer is (conditional)
$\alpha$-quantile (median for $\alpha=0.5$).
$\alpha$-quantile (median for $\alpha=0.5$)
\end{itemize}

\vfill
Expand Down

0 comments on commit d3e5627

Please sign in to comment.