Skip to content

Commit

Permalink
Add some more measure theory content
Browse files Browse the repository at this point in the history
Signed-off-by: Thomas Gassmann <tgassmann@student.ethz.ch>
  • Loading branch information
thomasgassmann committed Feb 26, 2025
1 parent 1967301 commit fafdb92
Showing 1 changed file with 31 additions and 2 deletions.
33 changes: 31 additions & 2 deletions large-language-models/main.typ
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,12 @@ An *energy function* is a function $hat(p) : Sigma^ast arrow RR$.
Any normalizable energy function $hat(p)_"GN"$ (meaning $Z_G$ is finite) induces a language model, i.e., a distribution over $Sigma^ast$.

#colorbox(title: [Sequence model], color: silver)[
For an alphabet $Sigma$ a sequence model is defined as a set of conditional probability distributions $p_"SM"(y | bold(y))$ for $y in Sigma, bold(y) in Sigma^ast$. $bold(y)$ is called history/context.
For an alphabet $Sigma$ a sequence model is defined as a set of conditional probability distributions $p_"SM"(y | bold(y))$ for $y in Sigma$ (mostly we use
$overline(Sigma) = Sigma union \{ EOS \}$) $bold(y) in Sigma^ast$. $bold(y)$ is called history/context. That is, we have $\{ p_"SM" (y | bold(y)) \}_(bold(y) in Sigma^ast)$ for $y in overline(Sigma)$.
]

A sequence model is a probability distribution over $Sigma^ast union Sigma^infinity$.

#colorbox(title: [Locally normalized model /autoregressive model])[
For $p_"SM"$ a sequence model over $overline(Sigma)$: A locally normalized language model (LNM) over $Sigma$ is defined as:
$
Expand All @@ -55,4 +58,30 @@ Any normalizable energy function $hat(p)_"GN"$ (meaning $Z_G$ is finite) induces
i.e. the cumulative probability of all strings in the language beginning with $bold(y)$.
]

Any language model can be locally normalized.
Any language model can be locally normalized. TODO: should know how to prove this, telescoping product

== Measure theory

Traditionally, a probability space is a triple $(Omega, cal(F), bb(P))$ where $bb(P)$ is a measure, $bb(P) [Omega] = 1$ and $cal(F) subset.eq cal(P) (Omega)$. To resolve e.g. the paradox of the infinite coin toss, we require $cal(F)$ to be a $sigma$-algebra.

#colorbox(title: [$sigma$ algebra], color: silver)[
A set $cal(F) subset.eq cal(P) (Omega)$ is called a $sigma$-algebra s.t.:
- $Omega in cal(F)$
- $Sigma in cal(F) arrow.r.double Sigma^complement in cal(F)$
- If $Sigma_1, Sigma_2, dots in cal(F)$, then $union.big_(n=1)^infinity Sigma_n in cal(F)$
]

#colorbox(title: [Probability measure], color: silver)[
A probability $bb(P)$ over a measure space $(Omega, cal(F))$ is a function $bb(P): cal(F) arrow [0,1]$ s.t.:
- $bb(P) (Omega) = 1$
- If $Sigma_1, Sigma_2, dots in cal(F)$ is a countable sequence of disjoint events, then $bb(P) [union.big_(n=1)^infinity Sigma_n] = sum_(n=1)^infinity bb(P) [Sigma_n]$.

i.e. we have a measure space if $Omega$ is a set and $cal(F)$ is a $sigma$-algebra over $Omega$.
]


#colorbox(title: [Random variable], color: silver)[
Let $(Omega, cal(F))$ and $(S, T)$ be measure spaces. A random variable is a measurable function from $Omega arrow S$.

A *measurable function* $x: Omega arrow S$ is such that $x^(-1) (Sigma)$ is measurable for $Sigma$ measurable, i.e. $Sigma in T arrow.double x^(-1) (Sigma) in cal(F)$.
]

0 comments on commit fafdb92

Please sign in to comment.