Formal Proof of Convergence in Probability

This page presents a formal proof of convergence in probability, the main engine for the law of large numbers. The material is beyond the regular coursework, but the math is not really that hard.

The sample mean is defined as $$ \bar{X} = \frac{1}{n} \sum_{i=1}^{n} X_i. $$ We recall that $(X_1, X_2,...,X_{n})$ are independent random variables each with mean $\mu$ and variance $\sigma^2$. We want to show that $$ P[|\bar{X}-\mu|>\epsilon] \rightarrow 0$$ This is what we mean by convergence in probability or the law of large numbers.

The proof has two steps. The first step rewrites the probability in terms of the variance of the sample mean. The second shows that when we have rewritten this in this way, that the probability gets smaller as the sample size n increases.

The first step is known as the Chebyshev inequality. First, note that $ (\bar{X}-\mu)^2 \ge 0 $, in words this means that the square of the distance of the sample mean to the actual mean is non negative (because we are taking the square). We start with the definition of the variance. $$ \begin{equation*} \begin{aligned} Var (\bar{X}) = & \int_{- \infty}^{\infty} (\bar{x} - \mu)^2 p_{X} d x \\ \ge & \int_{ \{x:(\bar{x} - \mu)^2 \ge r \}} (\bar{x} - \mu)^2 p_{X} dx \\ \ge & r \int_{ \{x:(\bar{x} - \mu)^2 \ge r \}} p_{X} d x \\ = & r P((\bar{X} - \mu)^2 \ge r). \end{aligned} \end{equation*} $$ In this calculation the distribution $p_X $ is the distribution of all the $X's$. In the second line we cut out the middle part of the integration (for -r to r). But since both the probability and $(\bar{x} - \mu)^2$ are non negative, leaving part of the integration out makes the integration smaller. Hence the inequality. For the third line we replace $(\bar{x} - \mu)^2$ with $r$, which is smaller for all of the values in the integral (we integrate over values of this square at least as big as $r$), so again we made the right hand side smaller. This leaves us with the integral overe the probabilities, recall that the area under the curve is the probability which we write in the the last line.

We can rewrite this as $$ P((\bar{X} - \mu)^2 \ge r ) \le \frac{Var (\bar{X})}{r} $$ which is useful in the next step.

We can now turn to the value we want to show goes to zero. $$ \begin{equation*} \begin{aligned} P \left [ |\bar{X}_n - \mu| \ge \epsilon \right ] = & P \left [ (\bar{X}_n - \mu)^2 \ge \epsilon^2 \right ] \\ \le & \frac{Var(\bar{X}_n)} {\epsilon^2 } \\ = & \frac{\sigma^2} {n \epsilon^2 } \\ \rightarrow & 0. \end{aligned} \end{equation*} $$ The first line is just squaring inside the brackets, the second line uses the result we derived above with $r = \epsilon^2$. Then, because the variance is going to zero as $n$ gets large, the right hand side goes to zero giving us our convergence in probability or law of large numbers result. Notice that this variance going to zero is also what is driving our pictures of the result, that the distribution for $\bar{X}$ gets thinner and thinner.