The Normal Approximation to the Binomial

Most sets of Binomial tables only include relatively few possible values for n. However, this is not a big problem because we can use the normal approximation when n is larger than in the table. There is a small trick however for computing probabilities when n is just a little larger than that available in the tables.

First, we should easily realize that our random variable $S=\sum_{i=1}^{n}X_{i}$ can be easily transformed into a sample average by dividing by n. Hence we can compute $$P[S>s]=P[n^{-1}S>n^{-1}s]=P[\bar{X}>n^{-1}s].$$ There is however an important detail buried in here - for discrete random variables like the Binomial we have to be careful with inequalities, but with the normal approximation we are not careful. The result is that we actually have two ways of computing any probability. This is because $$\begin{equation} P[S>s]=P[S\geq s+1] \end{equation} $$ Depending on which of these you use, you will get different answers. For n very large, the difference is likely in decimal places too small to worry about, but for n not too large this can make a difference.

To see this, we will work through a few calculations. Let $n=12$ and $p=0.5$ We will be interested in the $P[S \leq 4]$. From the Binomial distribution we can easily work out that this is exactly 0.194. But what happens with our normal approximation?

First, try $P[S \leq 4]$. We have $$\begin{equation} \begin{split} P[S \leq 4] &= P[\bar{X} \leq \frac{4}{12}] &= P[\frac{\bar{X}-0.5}{\sqrt{0.5 (1-0.5)/12}} \leq \frac{\frac{4}{12}-0.5}{\sqrt{0.5 (1-0.5)/12}}] \\ & \approx P[Z \leq -1.1547] \\ & = 0.1241. \end{split} \end{equation} $$ This is not really that close to the correct answer of 0.194 (remember, it is an approximation). But notice we could have also calculated this as $P[S < 5]$. We have for this construction of the problem $$\begin{equation} \begin{split} P[S < 5] &= P[\bar{X} < \frac{5}{12}] &= P[\frac{\bar{X}-0.5}{\sqrt{0.5 (1-0.5)/12}}<\frac{\frac{5}{12}-0.5}{\sqrt{0.5 (1-0.5)/12}}] \\ & \approx P[Z < -0.5774] \\ & = 0.282. \end{split} \end{equation} $$ The first of these gives an approximation that is too low, the second gives an approximation that is too large. But in some sense either should be correct (as n gets large). We could consider splitting the difference, and working out $P[S < 4.5]$, where this is in between the two values. If we do this we end up with $$\begin{equation} \begin{split} P[S < 4.5] &= P[\bar{X} < \frac{4.5}{12}] &= P[\frac{\bar{X}-0.5}{\sqrt{0.5 (1-0.5)/12}}<\frac{\frac{4.5}{12}-0.5}{\sqrt{0.5 (1-0.5)/12}}] \\ & \approx P[Z < -0.8660] \\ & = 0.1932. \end{split} \end{equation} $$ This is a lot closer, and for n near the upper end of the Binomial tables we should do this. However most problems involve n quite large, in which case the difference between the three choices is not going to be noticeable. The question as always is how large n must be. One way to check is to compute the probability for both the $<$ and $\leq$ cases and check that the answers are the same to your desired accuracy.