diff --git a/slides/advriskmin/slides-advriskmin-classification-bernoulli.tex b/slides/advriskmin/slides-advriskmin-classification-bernoulli.tex index e2a1ae20..c608d72a 100644 --- a/slides/advriskmin/slides-advriskmin-classification-bernoulli.tex +++ b/slides/advriskmin/slides-advriskmin-classification-bernoulli.tex @@ -231,9 +231,6 @@ - - - % \vspace*{-0.2cm} % \begin{eqnarray*} diff --git a/slides/advriskmin/slides-advriskmin-classification-brier.tex b/slides/advriskmin/slides-advriskmin-classification-brier.tex index 7bc34838..0a5b54fe 100644 --- a/slides/advriskmin/slides-advriskmin-classification-brier.tex +++ b/slides/advriskmin/slides-advriskmin-classification-brier.tex @@ -144,71 +144,6 @@ \end{vbframe} -\begin{vbframe}{Brier score minimization = Gini splitting} - -When fitting a tree we minimize the risk within each node $\Np$ by risk minimization and predict the optimal constant. Another approach that is common in literature is to minimize the average node impurity $\text{Imp}(\Np)$. - -\vspace*{0.2cm} - -\textbf{Claim:} Gini splitting $\text{Imp}(\Np) = \sum_{k=1}^g \pikN \left(1-\pikN \right)$ is equivalent to the Brier score minimization. - -\begin{footnotesize} -Note that $\pikN := \frac{1}{n_{\Np}} \sum\limits_{(\xv,y) \in \Np} [y = k]$ -\end{footnotesize} - -\vspace*{0.2cm} - -\begin{footnotesize} - -\textbf{Proof: } We show that the risk related to a subset of observations $\Np \subseteq \D$ fulfills - - -$$ - \risk(\Np) = n_\Np \text{Imp}(\Np), -$$ - - where $\text{Imp}$ is the Gini impurity and $\risk(\Np)$ is calculated w.r.t. the (multiclass) Brier score - - -$$ - L(y, \pix) = \sum_{k = 1}^g \left([y = k] - \pi_k(\xv)\right)^2. -$$ - -\framebreak - -\vspace*{-0.5cm} -\begin{eqnarray*} -\risk(\Np) &=& \sum_{\xy \in \Np} \sum_{k = 1}^g \left([y = k] - \pi_k(\xv)\right)^2 -= \sum_{k = 1}^g \sum_{\xy \in \Np} \left([y = k] - \frac{n_{\Np,k}}{n_{\Np }}\right)^2, -\end{eqnarray*} - -by plugging in the optimal constant prediction w.r.t. the Brier score ($n_{\Np,k}$ is defined as the number of class $k$ observations in node $\Np$): -$$\hat \pi_k(\xv)= \pikN = \frac{1}{n_{\Np}} \sum\limits_{(\xv,y) \in \Np} [y = k] = \frac{n_{\Np,k}}{n_{\Np }}. $$ - - We split the inner sum and further simplify the expression - -\begin{eqnarray*} -&=& \sum_{k = 1}^{g} \left(\sum_{\xy \in \Np: ~ y = k} \left(1 - \frac{n_{\Np,k}}{n_{\Np }}\right)^2 + \sum_{\xy \in \Np: ~ y \ne k} \left(0 - \frac{n_{\Np,k}}{n_{\Np }}\right)^2\right) \\ -&=& \sum_{k = 1}^g n_{\Np,k}\left(1 - \frac{n_{\Np,k}}{n_{\Np }}\right)^2 + (n_{\Np } - n_{\Np,k})\left(\frac{n_{\Np,k}}{n_{\Np }}\right)^2, -\end{eqnarray*} - -since for $n_{\Np,k}$ observations the condition $y = k$ is met, and for the remaining $(n_\Np - n_{\Np,k})$ observations it is not. - - -We further simplify the expression to - -% \begin{footnotesize} -\begin{eqnarray*} -\risk(\Np) &=& \sum_{k = 1}^g n_{\Np,k}\left(\frac{n_{\Np } - n_{\Np,k}}{n_{\Np }}\right)^2 + (n_{\Np } - n_{\Np,k})\left(\frac{n_{\Np,k}}{n_{\Np }}\right)^2 \\ -&=& \sum_{k = 1}^g \frac{n_{\Np,k}}{n_{\Np }} \frac{n_{\Np } - n_{\Np,k}}{n_{\Np }} \left(n_{\Np } - n_{\Np,k } + n_{\Np,k}\right) \\ -&=& n_{\Np } \sum_{k = 1}^g \pikN \cdot \left(1 - \pikN \right) = n_\Np \text{Imp}(\Np). -\end{eqnarray*} -% \end{footnotesize} - -\end{footnotesize} - -\end{vbframe} - \endlecture