Student t Distributions

Some of you might have encountered t distributions before. We will not cover these, this page is to explain why.

In many old textbooks, the following situation for testing is discussed. The distribution for each $X_i$ is $N(\mu, \sigma^2)$, but we do not (as in the regular situation in the main part of the notes) know $\sigma^2$. Instead, when we construct a t statistic we use the sqaure root of the estimator $ s^2 = \frac{1}{n-1} \sum_{i=1}^n (x_i - \bar(x))^2 $ for the denominator of our t statistic. This is our usual variance estimator. The key difference here is that we 'know' that the data is normally distributed.

In this problem, the exact distribution of the t statistic is the t distribution with $n-1$ 'degrees of freedom'. The tables for critical values depend on the number of degrees of freedom. The distribution itself is bell shaped like the normal, but wider than the normal distribution. In many exams of the past the trick played by professors was to ask the questions with and without the normality assumption to check you used the correct table - the normal table if using the central limit theorem or the t distribution table when we know that the data is normally distributed.

So why do we not do this here? Despite the wonderful story of the origins of the solution to this problem (see below), this problem is rarely encountered in practice. The basic idea behind this problem is that we are absolutely sure of the distribution of the $X_i$, that it is normal, but have no idea how spread out the distribution is. This is clearly ridiculous for most problems. We have no idea for most continuous random variable problems what the true distribution of $X_i$ is. This is why the central limit theorem is so wonderful, it allows us to get an approximately correct result despite not knowing this. You will never see the t distribution used in this way in published papers in good journals. It does have other uses, but these are outside the topics covered in this class.

So what is the story? In the early 1900's W.S. Gosset was working on a particular production process and was deeply interested in using new ideas from statistics to improve the processes. His first degree was in mathematics, so he knew a bit. He knew about how to construct t statistics, but realized his practical problem that if he estimated the standard deviation from data, he would no longer have a normal distribution for the sample mean even when the underlying data was normal. Faced with this problem, he solved it and published it. He published under the pseudonym 'Student', perhaps needing to not sully his industry with being associated with statistics. And what production process was he working on? He was the Head Experimental Brewer at Guinness Breweries - he was using statistics to get better beer. In the end he remained most of his life as a brewer, but along the way published many other papers on using statistics to improve industrial processes.