The p-value Controversy

At various times use of the p-value has come under criticism. This hit the headlines (in science at least) a few years back when a psychology journal (Basic and Applied Social Psychology) banned the use of p-values in their journal. The new editor at the time had been a longtime critic of the use of p-values. The American Statistical Association even felt the need to have a statement about the use of p-values to counter the arguments going around at the time (see here).

Given that we have seen in the notes that the p-value is simply an alternative way of reporting the results of a hypothesis test, then it would seem that this is a direct criticism of the use of hypothesis testing. In part the complaints are a criticism of hypothesis testing. Hypothesis testing is very useful for situations in which we do actually need to test a hypothesis that we might go on to rely on. For example in a drug trial, we might really want to know if there is evidence that the new drug outperforms the old one. Hypothesis tests, and thus p-values, are an important tool for such a situation. But there are other situations where hypothesis tests are sometimes run even though there is no point to them. Suppose you really do not have a hypothesis to test, but are trying to measure some effect. Often researchers will by rote learning test that the effect is zero as though we might imagine this is an interesting hypothesis, even though there might be no reason to believe this is an interesting hypothesis. For example for a treatment of mild depression we might want to measure the effect, but really do not think that the treatment will have no effect. Here it makes sense to use a confidence interval, a treatment with an effect that might be so poorly measured that we cannot reject that it has zero effect might also be one in which we cannot reject that it has a very important effect. The hypothesis test examining a zero effect might be misunderstood to mean we think there is no effect when large effects are also likely. So for some problems hypothesis testing and hence p-values might be misleading.

A second point of concern for the p-value is that the p-value is sometimes interepreted incorrectly as the probability that the null hypothesis is correct. So a small p-value leads to rejection. However this is not the correct interpretation of the p-value. The p-value, as we see in Chapter 8, is the probability under the null of seeing a more extreme value than the one we observe in data. These two probability statements are not the same, it is wrong to think of the p-value as the probability that the null is true. So a second reason to reject the use of p-values is that some readers might misunderstand what they mean.

Both of these reasons are essentially arguing that the p-value should be discarded as a tool because it might be misused. It seems a bit strong to suggest banning the tool because less statistically literate researchers and readers misunderstand how to apply and understand them. It is reasonable if there are no hypotheses to test for a study that one then does not construct a hypothesis test or report a p-value.

For more views on this see for example this general discussion from VOX or this discussion from FiveThirtyEight.