You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Monte Carlo (MC) method is a numerical method that makes use of random numbers to solve mathematical problems for which an analytical solution is not known.
A simple MC example
That is the determination of $\pi$. Suppose we have circle with radius $r=1$inscribe within a square. Then the area ratio is
$$
\frac{A_{\mathrm{circle}}}{A_{\mathrm{square}}}=\frac{\pi r^2}{4r^2}=\frac{\pi}{4}
$$
To determine $\pi$ we randomly select $n$ points, $p_i=(x_i,y_i)$ for $i=1\ldots n$, in the square and approximate the area ratio by
$$
\frac{A_{\mathrm{circle}}}{A_{\mathrm{square}}}\approx \frac{m}{n}
\rightarrow \frac{\pi}{4}\approx \frac{m}{n}
$$
where $m$ is the number of random points in the circle.
The method is called hit-or-miss Monte Carlo.
MC Integration
Let $f(x)$ be an arbitrary continuous function and $y=f(x)$ the new random variable. Then the expected value and variance of $y$ is
$$
E[y]=E[f(x)]=\int_{a}^b f(x)p(x)\dd{x}\
\mathrm{Var}[y]=\mathrm{Var}[f(x)]=\int_{a}^{b}(f(x)-E[f(x)])^2p(x)\dd{x}
$$
Our goal is to calculate the expectation of $f(x)$ without computing the integral. This can be achieved by using a MC simulation.
Crude Monte Carlo
A simple estimate of the integral can be obtained by generating $n$ samples $x_i\sim p(x)$, for $i=1\ldots n$, and computing the estimate
$$
I=\frac{1}{n}\sum_{i=1}^n f(x_i)
$$
It based on the two theorems.
The law of large numbers: it implies that the average of a sequence of random variables, of a known distribution, converge to its expectation as the numbers of samples goes to infinity.
$$
I=\frac{1}{n}\sum_{i=1}^nf(x_i)
\quad \rightarrow \quad E[f(x)]=\int_{a}^{b}f(x)p(x)\dd{x}
$$
The central limit theorem: the sum of a large number of independent random variables is approximately normally distributed.
It do not restrict the random variables to be identical and independent distribution(i.i.d), see [baidu_baike](中心极限定理_百度百科 (baidu.com)). (i.i.d) is one of the special situation.
A sum of identical and independent distribution (i.i.d) random variables $z_i$ with mean $\mu$ and finite variance $\sigma^2$
$$
\frac{\sum_{i=1}^{n}z_i-n\mu}{\sigma\sqrt{n}}\rightarrow \mathcal{N}(0,1) \quad \mathrm{as} \quad n\rightarrow \infty
$$
If the variance of $f(x)$ is finite, the MC estimate is consistent.
The MC estimate is asymptotically unbiased
The MC estimate is asymptotically normally distributed
The standard deviation of the MC estimate is given by $\sigma=\frac{\sqrt{\mathrm{Var}[f(x)]}}{\sqrt{n}}$
Also note that the MC methods are immune to the curse of dimensionality. According to the expression of error, the accuracy of the method can be improved by increasing the number of samples $n$, but the convergence is very slow. A better way to improve the accuracy is to decrease the variance, $\mathrm{Var}[f(x)]$. This methods are called variance-reducing techniques.
Variance Reducing Techniques
I only introduce the importance sampling technique. Read the document Tutorial on Monte Carlo Techniques for more variance reducing techniques.
Importance Sampling
The pdf under the integral, $p(x)$, may not be the best pdf for MC integration, which means that $p(x)$ is hard to be generated by computer to execute the sampling process. In this case we want to use a different and simpler pdf $q(x)$ from which we can draw sample. $q(x)$ is called the importance density or the proposal distribution.
Therefore we can write
$$
\begin{split}
E[y]=E[f(x)] &= \int_a^b f(x)p(x)\dd x\
&= \int_a^b \frac{f(x)p(x)}{q(x)} q(x)\dd x\
&= E\left[\frac{f(x)p(x)}{q(x)}\right]
\end{split}
$$
By generating $n$ samples $x_i\sim q(x)$, for $i=1\ldots n$, the estimate becomes
$$
I=\frac{1}{n}\sum_{i=1}^{n}\frac{f(x_i)p(x_i)}{q(x_i)}=\frac{1}{n}\sum_{i=1}^{n}W(x_i)f(x_i)
$$
where $W(x_i)=\frac{p(x_i)}{q(x_i)}$ are the importance weights.
The sample $x_i\sim q(x)$ represents the position and the weights $W(x_i)$ represents the probability appearing in this position since the weight is the ratio of $W(x_i)=\frac{p(x_i)}{q(x_i)}=\frac{\mathrm{target\space distribution}}{\mathrm{proposal\space distribution}}$. So, probability represented by density of sub-domain is convert to the position and the corresponding weight.
Since the normalizing factor $p(x_i)$ is not know, we have to normalize the weights such that $\sum_{i=1}^{n}W(x_i)=1$
$$
I=\frac{\frac{1}{n}\sum_{i=1}^{n}W(x_i)f(x_i)}{\sum_{i=1}^{n}W(x_i)}=\frac{1}{n}\sum_{i=1}^{n}W_n(x_i)f(x_i)
$$
where $W_n(x_i)=\frac{W(x_i)}{\sum_{i=1}^nW(x_i)}$ are the normalize importance weights.
The variance of the importance sampler estimate is
$$
\mathrm{Var}_q[f(x)]=\frac{1}{n}\int \left[\frac{[f(x)q(x)]^2}{q(x)}\right] \dd{x}-\frac{E_p^2[f(x)]}{n}
$$
To reduce the variance $q(x)$ should be chosen to match the shape of $p(x)$ or $|f(x)|p(x)$
Sequential Monte Carlo
In this section, we focus on sequential state estimation using sequential Monte Carlo.
Sequential Importance Sampling
Consider the sequential situation
$$
E[f(\mathrm{x}{1:n})] = \int f(\mathrm{x}{1:n})p(\mathrm{x}{1:n})\dd{\mathrm{x}{1:n}}
$$
Here $\mathrm{x}_{1:n}$ means $x_1,x_2,\ldots,x_n$.
Do the same process as above
$$
E[f(\mathrm{x}{1:n})] =
\int f(\mathrm{x}{1:n}) \frac{p(\mathrm{x}{1:n})}{q(\mathrm{x}{1:n})}q(\mathrm{x}{1:n})\dd{\mathrm{x}{1:n}}
$$
from which we get the important weight $W(\mathrm{x}{1:n})=\frac{p(\mathrm{x}{1:n})}{q(\mathrm{x}{1:n})}$ and proposal distribution $q(\mathrm{x}{1:n})$.
We can change the expression of important weight into iterative form.
$$
\begin{split}
W(\mathrm{x}{1:n}) &= \frac{p(\mathrm{x}{1:n})}{q(\mathrm{x}{1:n})}\
&= \frac{p(\mathrm{x}{1:n})}{q(x_n|\mathrm{x}{1:n-1})q(\mathrm{x}{1:n-1})}\
&= \frac{p(\mathrm{x}{1:n})}{q(x_n|\mathrm{x}{1:n-1})q(\mathrm{x}{1:n-1})}\cdot \frac{p(\mathrm{x}{1:n-1})}{p(\mathrm{x}{1:n-1})}\
&= \frac{p(\mathrm{x}{1:n})}{q(x_n|\mathrm{x}{1:n-1})p(\mathrm{x}{1:n-1})}\cdot \frac{p(\mathrm{x}{1:n-1})}{q(\mathrm{x}{1:n-1})}\
&= \frac{p(\mathrm{x}{1:n})}{q(x_n|\mathrm{x}{1:n-1})p(\mathrm{x}{1:n-1})}\cdot W(\mathrm{x}{1:n-1})\
\end{split}
$$
Since directly sampling $\mathrm{x}{1:n}$ from $n$ dimension is equal to sample $\mathrm{x}{1:n-1}$ firstly and sample $x_n$ sequentially. To calculate the weight at $t=n$ of the i-th sample from weight at $t=n-1$ of the i-th sample, we sample the state $x_n$ from $q(x_n|\mathrm{x}{1:n-1}^{[i]})$.
$$
W(\mathrm{x}{1:n}^{[i]})=\frac{p(\mathrm{x}{1:n}^{[i]})}{q(x_n^{[i]}|\mathrm{x}{1:n-1}^{[i]})p(\mathrm{x}{1:n-1}^{[i]})}W(\mathrm{x}{1:n-1}^{[i]})
$$
Above is the mainly feature of sequential importance sampling. The following question is how to choose proposal distribution $q(x_n|\mathrm{x}_{1:n-1})$