Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added ic sheet advriskmin_2 #229

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Binary file added exercises-pdf/ic_advriskmin_2.pdf
Binary file not shown.
Binary file added exercises-pdf/ic_sol_advriskmin_2.pdf
Binary file not shown.
16 changes: 16 additions & 0 deletions exercises/advriskmin/ex_rnw/ic_risk_minimizers_brier_score.Rnw
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
Now consider the Brier score:
$$
\Lpixy = \lbrier
$$

\begin{enumerate}
\item
What is $\pibayes_c$ (the optimal constant model in terms of the theoretical risk)?
\lz
\lz
\lz
\lz
\lz
\item
What is its risk?
\end{enumerate}
28 changes: 28 additions & 0 deletions exercises/advriskmin/ex_rnw/ic_risk_minimizers_log_loss.Rnw
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
The first loss function of interest is the log-loss:
$$
\Lpixy = \lcrossent
$$

\begin{enumerate}
\item
What is $\pibayes_c$ (the optimal constant model in terms of the theoretical risk)?
\lz
\lz
\lz
\lz
\lz
\item
What is its risk?
\lz
\lz
\lz
\lz
\lz
\item
What is $\thetah$ (the optimal constant model in terms of the \textit{empirical} risk)?
\lz
\lz
\lz
\lz
\lz
\end{enumerate}
35 changes: 35 additions & 0 deletions exercises/advriskmin/ex_rnw/sol_risk_minimizers_brier_score.Rnw
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
\begin{enumerate}
\item
The loss using Brier score is given by $L\left(y, \pix\right) = \left(y-\pix\right)^{2}$.
\begin{align*}
\pibayes_{c} &= \argmin_{c} \E_{xy}\left[L\left(y, c\right)\right]\\
&= \argmin_{c} \E_{y}\left[\left(y - c\right)^{2}\right]\\
&= \argmin_{c} \E_{y}\left[y^{2} - 2yc + c^{2}\right]\\
&= \argmin_{c} \var_{y}(y) + \pi^{2} - 2c\pi + c^{2}\\
&= \argmin_{c} \pi (1-\pi^{2}) + \pi^{2} - 2c\pi + c^{2}\\
&= \argmin_{c} \pi - \pi^{2} + \pi^{2} - 2c\pi + c^{2}\\
&= \argmin_{c} \pi - 2c\pi + c^{2}\\
&= \argmin_{c} -2c\pi + c^{2}
\end{align*}
Where we used $\var(y) = \E(y^2) - \left[\E(y)\right]^2$.\\
Taking the deriviative with respect to $c$ and setting it to $0$:
\begin{align*}
&\Rightarrow
\pd{}{c} \left[-2c\pi + c^{2}\right] \overset{!}{=} 0\\
&\begin{alignedat}{2}
&\Rightarrow -2\pi + 2c &&= 0 \\
&\Rightarrow \pi &&= c \\
&\Rightarrow \pibayes_{c} &&= \mathbb{P}(y = 1)
\end{alignedat}
\end{align*}
\item
\begin{align*}
\risk_{l}(\pibayes_{c}) &= \E_{xy} \left[ L\left(y, \pi \right) \right]\\
&= \E_{y} \left[ \left(y-\pi\right) \right] \\
&= \E_{y} \left[y^{2} - 2y\pi + \pi^{2}\right] \\
&= \pi - 2\pi^{2}+\pi^{2} \\
&= \pi - \pi^2 \\
&= \pi(1-\pi) \\
&= \var_{y}(y)
\end{align*}
\end{enumerate}
51 changes: 51 additions & 0 deletions exercises/advriskmin/ex_rnw/sol_risk_minimizers_log_loss.Rnw
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
\begin{enumerate}
\item
\begin{align*}
\pibayes_{c} &= \argmin_{c \in \unitint} \E_{xy} \left[ L\left(y, c\right) \right] = \argmin_{c} \E_{y} \left[ L\left(y, c\right) \right]\\
&= \argmin_{c} \E_{y} \left[ -y \log \left(c\right) - \left(1-y\right) \log \left(1-c\right) \right] \\
&= \argmin_{c} - \log \left(c\right) \underbrace{\E_{y} \left[y\right]}_{=\P(y=1)=\pi} -\log \left(1-c\right) \underbrace{\E_{y} \left[1-y\right]}_{=1-\pi} \\
&= \argmin_{c} - \left[\pi \log \left(c\right) + \left(1-\pi\right) \log \left(1-c\right) \right]
\end{align*}
Taking the derivative with respect to $c$ and setting it to $0$:
\begin{align*}
&\Rightarrow \pd{}{c} \left[- \pi \log \left(c\right) + \left(1-\pi\right) \log \left(1-c\right) \right] \overset{!}{=} 0 \\
&\begin{alignedat}{2}
&\Rightarrow -\frac{\pi}{c} + \frac{1 - \pi}{1 - c} &&= 0 \\
&\Rightarrow c (1 - \pi) &&= (1 - c) \pi \\
&\Rightarrow c &&= \pi \\
&\Rightarrow \pibayes_{c} &&= \mathbb{P}(y = 1)
\end{alignedat}
\end{align*}

\item
\begin{align*}
\risk_{l}(\pibayes_{c}) &= \E_{xy} \left[ L\left(y, \pi \right) \right]\\
&= \E_{y} \left[ -y \log \left( \pi \right) - \left(1-y\right) \log \left(1 - \pi \right) \right] \\
&= - \pi \log(\pi) - (1-\pi) log(1-\pi) \\
&= H(y) \text{ (= Entropy!)}
\end{align*}

\item
$\thetah$, the optimal constant model in terms of the \textit{empirical} risk, is given by $\thetah = \argmin_{\theta \in \Theta} \risk_{emp}(\theta)$.
\begin{align*}
\risk_{emp}(\theta) &= \sumin L\left(\yi, \fxi\right)\\
&= \sumin \log \left(1 + \exp(-\yi \theta) \right)
\end{align*}
As $L(y, \theta) = log\left(1 + \exp(-\yi\theta)\right)$.
Taking the derivative:
\begin{align*}
\pd{}{\theta} \risk_{emp}(\theta) &= \sumin \frac{1}{1+\exp(-\yi\theta)} \left(+\exp(-\yi\theta) \right) (-\yi)\\
&= - \sumin \yi \frac{\exp(c)}{1 + \exp(c)}\\
&= - \sum\limits_{\yi=1}^n (1) \frac{\exp(-\theta)}{1+\exp(-\theta)} - \sum\limits_{\yi=-1}^n (-1) \frac{\exp(\theta)}{1+\exp(\theta)}\\
&\overset{!}{=} 0
\end{align*}
This is equivalent to:
\begin{align*}
\sum\limits_{\yi=-1}^n \frac{\exp(\theta)}{1+\exp(\theta)} &= \sum\limits_{\yi=1}^n \frac{\exp(-\theta)}{1+\exp(-\theta)}\\
n_{-} \frac{\exp(\theta)}{1+\exp(\theta)} &= n_{+} \frac{1}{1+\exp(\theta)}\\
\frac{n_{+}}{n_{-}} &= \exp(\theta)\\
\theta &= \log(\frac{n_{+}}{n_{-}})
\end{align*}


\end{enumerate}
67 changes: 67 additions & 0 deletions exercises/advriskmin/ic_advriskmin_2.Rnw
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
% !Rnw weave = knitr

<<setup-child, include = FALSE>>=
library('knitr')
knitr::set_parent("../../style/preamble_ueb.Rnw")
@


\kopficsl{}{Risk Minimization - Classification}

Consider the binary classification learning setting, where $\mathcal{Y}=\setzo$, some feature space $\Xspace$, and the hypothesis space $\Hspace = \{ \pi:\Xspace \to \unitint | \pix = s(\thetav^T\xv), \thetav \in \Theta \}$ of logistic regression.

\lz

\aufgabe{Risk Minimizers for the Log-Loss}{
<<child="ex_rnw/ic_risk_minimizers_log_loss.Rnw">>=
@
}

\aufgabe{Risk Minimizers for the Brier Score}{
<<child="ex_rnw/ic_risk_minimizers_brier_score.Rnw">>=
@
}

\newpage
\newgeometry{left=1cm, right=1cm, top=1cm, bottom=1cm}
\begin{sidewaystable}[ht]
\centering
\renewcommand{\arraystretch}{5}
\setlength{\tabcolsep}{8pt}
\resizebox{\textwidth}{!}{
\begin{tabular}{|>{\centering\arraybackslash}m{1.5cm}|>{\centering\arraybackslash}m{4.5cm}|>{\centering\arraybackslash}m{4.5cm}|>{\centering\arraybackslash}m{4.5cm}|>{\centering\arraybackslash}m{4.5cm}|}
\hline
\textbf{\large Loss} &
\textbf{\large Risk minimizer} &
\textbf{\large Bayes risk} &
\textbf{\large Optimal \newline constant model} &
\textbf{\large Risk of optimal \newline constant model} \\ \hline

\rule{0pt}{2cm}\textbf{\Large{L2}} &
&
&
&
\\ \hline

\rule{0pt}{2cm}\textbf{\Large{0/1}} &
&
&
&
\\ \hline

\rule{0pt}{2cm}\textbf{\Large{Log}} &
&
&
&
\\ \hline

\rule{0pt}{2cm}\textbf{\Large{Brier}} &
&
&
&
\\ \hline

\end{tabular}
}
\end{sidewaystable}
\restoregeometry
58 changes: 58 additions & 0 deletions exercises/advriskmin/ic_sol_advriskmin_2.Rnw
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
% !Rnw weave = knitr

<<setup-child, include = FALSE>>=
library('knitr')
knitr::set_parent("../../style/preamble_ueb.Rnw")
@


\kopficsl{}{Risk Minimization - Classification}

\loesung{Risk Minimizers for the Log-Loss}{
<<child="ex_rnw/sol_risk_minimizers_log_loss.Rnw">>=
@
}

\loesung{Risk Minimizers for the Brier Score}{
<<child="ex_rnw/sol_risk_minimizers_brier_score.Rnw">>=
@
}

\newpage
\newgeometry{left=1cm, right=1cm, top=1cm, bottom=1cm}
\begin{sidewaystable}[ht]
\centering
\renewcommand{\arraystretch}{5}
\setlength{\tabcolsep}{8pt}
\resizebox{\textwidth}{!}{
\begin{tabular}{|>{\centering\arraybackslash}m{1.5cm}|>{\centering\arraybackslash}m{4.5cm}|>{\centering\arraybackslash}m{4.5cm}|>{\centering\arraybackslash}m{4.5cm}|>{\centering\arraybackslash}m{4.5cm}|}
\hline
\textbf{\Large Loss} &
\textbf{\Large Risk minimizer} &
\textbf{\Large Bayes risk} &
\textbf{\Large Optimal \newline constant model} &
\textbf{\large Risk of optimal \newline constant model} \\ \hline
\rule{0pt}{2cm}\textbf{\Large{L2}} &
\normalsize{$\E_{y|\xv}\left(y|\xv\right) = \fxbayes$} &
\normalsize{$\riskbayes_{L2} = \E_x[\var_{y|x}(y|x)]$} &
\normalsize{$\E_y[y] = \fbayes_c$} &
\normalsize{$\var_y(y) = \risk_{L2}(\fbayes_c)$} \\ \hline
\rule{0pt}{2cm}\textbf{\Large{0/1}} &
\normalsize{$\hxbayes = \argmax_{C \in \Yspace} \P(y = C | \xv = \xv)$} &
\normalsize{$\riskbayes_{0/1} = 1 - \E_x[\mathop{\mathrm{max}}_{C \in \Yspace}\P(y=C|\xv = \xv)]$} &
\normalsize{Exercise 2} &
\normalsize{Exercise 2} \\ \hline
\rule{0pt}{2cm}\textbf{\Large{Log}} &
\normalsize{$\pixbayes = \P(y = 1 | \xv = \xv)$} &
\normalsize{$\riskbayes_{l} = \E_x[\text{H}_{y|x}(y|x)]$} \newline exp. cond. entropy (ch. 13) &
\normalsize{$\pibayes_c = \P(y = 1)$} &
\normalsize{$\text{H}_y(y) = \risk_{l}(\pibayes_c)$} \\ \hline
\rule{0pt}{2cm}\textbf{\Large{Brier}} &
\normalsize{$\pixbayes = \P(y = 1 | \xv = \xv)$} &
\normalsize{$\riskbayes_{B} = \E_x[\var_{y|x}(y|x)]$} \newline $\left(= \riskbayes_{L2}\right)$ &
\normalsize{$\pibayes_c = \P(y = 1)$} &
\normalsize{$\var_y(y) = \risk_B(\pibayes_c)$} \\ \hline
\end{tabular}
}
\end{sidewaystable}
\restoregeometry
43 changes: 43 additions & 0 deletions exercises/gaussian-processes/ex_rnw/ic_sol_gp_2.Rnw
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
\begin{enumerate}
\item
\begin{align*}
\Kmat &=
\begin{pmatrix}
1 & 0 & 0 & 0 \\
0 & 1 & 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1
\end{pmatrix}\\
\bm{K}_*^T &= \begin{pmatrix}0.6\\0\\0.3\\0\end{pmatrix}^T\\
m_{\text{post}} &= \begin{pmatrix}0.6\\0\\0.3\\0\end{pmatrix}^T
\begin{pmatrix}
(1+\sigma^2)^{-1} & 0 & 0 & 0 \\
0 & (1+\sigma^2)^{-1} & 0 & 0 \\
0 & 0 & (1+\sigma^2)^{-1} & 0 \\
0 & 0 & 0 & (1+\sigma^2)^{-1}
\end{pmatrix}
\bm{y}\\
&= \begin{pmatrix}\frac{0.6}{1+\sigma^2}&0&\frac{0.3}{1+\sigma^2}&0\end{pmatrix}
\begin{pmatrix}
3\\
3.3\\
2.0\\
2.7
\end{pmatrix}\\
&= \frac{1.8}{1+\sigma^2} + \frac{0.6}{1+\sigma^2} = \frac{2.4}{1+\sigma^2}
\end{align*}
\item
\begin{align*}
k_\text{post} &= 1 - \begin{pmatrix}\frac{0.6}{1+\sigma^2}&0&\frac{0.3}{1+\sigma^2}&0\end{pmatrix}
\begin{pmatrix}
0.6\\
0\\
0.3\\
0
\end{pmatrix}\\
&= 1 - \left(\frac{0.36}{1+\sigma^2}\frac{0.09}{1+\sigma^2}\right) = 1 - \frac{0.45}{1+\sigma^2}
\end{align*}
\item Repeat the calculations from (a) and (b) by using as the test input $x_* = \xi$ for each $i=1,2,3,4,$ respectively.
\item Based on your calculations so far, try to sketch the posterior Gaussian process.
\item If the nugget $\sigma^2$ would be zero, how would the posterior Gaussian process (roughly) look like?
\end{enumerate}
14 changes: 14 additions & 0 deletions exercises/gaussian-processes/ic_sol_gp_2.Rnw
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
% !Rnw weave = knitr

<<setup-child, include = FALSE>>=
library('knitr')
knitr::set_parent("../../style/preamble_ueb.Rnw")
@

\kopficsl{}{Gaussian Processes}


\loesung{Gaussian Processes - Prediction}{
<<child="ex_rnw/ic_sol_gp_2.Rnw">>=
@
}
3 changes: 3 additions & 0 deletions style/preamble_ueb.Rnw
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
\usepackage[utf8]{inputenc}
%\usepackage[ngerman]{babel}
\usepackage{a4wide,paralist}
\usepackage{geometry}
\usepackage{amsmath, amssymb, xfrac, amsthm}
\usepackage{dsfont}
%\usepackage[usenames,dvipsnames]{xcolor}
Expand All @@ -23,6 +24,8 @@
\usepackage{bm}
\usepackage{algorithm}
\usepackage{algpseudocode}
\usepackage{rotating}
\usepackage{array}


\input{../../style/common}
Expand Down