Add some more measure theory content

Signed-off-by: Thomas Gassmann <tgassmann@student.ethz.ch>
thomasgassmann · Feb 26, 2025 · fafdb92 · fafdb92
1 parent 1967301
commit fafdb92
Showing 1 changed file with 31 additions and 2 deletions.
diff --git a/large-language-models/main.typ b/large-language-models/main.typ
@@ -36,9 +36,12 @@ An *energy function* is a function $hat(p) : Sigma^ast arrow RR$.
 Any normalizable energy function $hat(p)_"GN"$ (meaning $Z_G$ is finite) induces a language model, i.e., a distribution over $Sigma^ast$.
 
 #colorbox(title: [Sequence model], color: silver)[
-  For an alphabet $Sigma$ a sequence model is defined as a set of conditional probability distributions $p_"SM"(y | bold(y))$ for $y in Sigma, bold(y) in Sigma^ast$. $bold(y)$ is called history/context.
+  For an alphabet $Sigma$ a sequence model is defined as a set of conditional probability distributions $p_"SM"(y | bold(y))$ for $y in Sigma$ (mostly we use 
+  $overline(Sigma) = Sigma union \{ EOS \}$) $bold(y) in Sigma^ast$. $bold(y)$ is called history/context. That is, we have $\{ p_"SM" (y | bold(y)) \}_(bold(y) in Sigma^ast)$ for $y in overline(Sigma)$.
 ]
 
+A sequence model is a probability distribution over $Sigma^ast union Sigma^infinity$.
+
 #colorbox(title: [Locally normalized model /autoregressive model])[
   For $p_"SM"$ a sequence model over $overline(Sigma)$: A locally normalized language model (LNM) over $Sigma$ is defined as:
   $
@@ -55,4 +58,30 @@ Any normalizable energy function $hat(p)_"GN"$ (meaning $Z_G$ is finite) induces
   i.e. the cumulative probability of all strings in the language beginning with $bold(y)$.
 ]
 
-Any language model can be locally normalized. 
+Any language model can be locally normalized. TODO: should know how to prove this, telescoping product
+
+== Measure theory
+
+Traditionally, a probability space is a triple $(Omega, cal(F), bb(P))$ where $bb(P)$ is a measure, $bb(P) [Omega] = 1$ and $cal(F) subset.eq cal(P) (Omega)$. To resolve e.g. the paradox of the infinite coin toss, we require $cal(F)$ to be a $sigma$-algebra.
+
+#colorbox(title: [$sigma$ algebra], color: silver)[
+  A set $cal(F) subset.eq cal(P) (Omega)$ is called a $sigma$-algebra s.t.:
+  - $Omega in cal(F)$
+  - $Sigma in cal(F) arrow.r.double Sigma^complement in cal(F)$
+  - If $Sigma_1, Sigma_2, dots in cal(F)$, then $union.big_(n=1)^infinity Sigma_n in cal(F)$
+]
+
+#colorbox(title: [Probability measure], color: silver)[
+  A probability $bb(P)$ over a measure space $(Omega, cal(F))$ is a function $bb(P): cal(F) arrow [0,1]$ s.t.:
+  - $bb(P) (Omega) = 1$
+  - If $Sigma_1, Sigma_2, dots in cal(F)$ is a countable sequence of disjoint events, then $bb(P) [union.big_(n=1)^infinity Sigma_n] = sum_(n=1)^infinity bb(P) [Sigma_n]$.
+
+  i.e. we have a measure space if $Omega$ is a set and $cal(F)$ is a $sigma$-algebra over $Omega$.
+]
+
+
+#colorbox(title: [Random variable], color: silver)[
+  Let $(Omega, cal(F))$ and $(S, T)$ be measure spaces. A random variable is a measurable function from $Omega arrow S$.
+
+  A *measurable function* $x: Omega arrow S$ is such that $x^(-1) (Sigma)$ is measurable for $Sigma$ measurable, i.e. $Sigma in T arrow.double x^(-1) (Sigma) in cal(F)$.
+]