Understanding Conditional Probabilities

We examine more deeply conditional probabilities in the context of an example. One of the take-aways you should have is that the ordering of conditional probabilities matters, which is to say that for any two events A and B that $P(A|B)$ and $P(B|A)$ are typically different, and often very different from each other and you really need to know which way around the conditioning statement is for you to not make a mistake.

Consider the following Venn Diagram, where the probabilities of each of the joint events is given in the relevant regions. We are looking at the chances that a random person likes pizza and/or likes beer. For example the $P(\text{Pizza but not Beer})=5/10$. FIGURE HERE

We can represent the same numbers in a probability table as we see here.

Pizza\Beer No Yes
No $ 0.1 $ $ 0.2 $
Yes $ 0.5 $ $ 0.2 $

With all of the probabilities of the joint events available, it is simple to construct our marginal probabilities (for example $P(\text{Yes to Pizza})$ or $P(\text{Yes to Beer})$) and it is straightforward to compute our conditional probabilites (for example $P(\text{Likes Pizza}|\text{Not liking Beer})$). Use the formulas directly to check that you can do that.

First, we will visually think about what the conditional probability really is. Suppose we want to condition on those who like Pizza. In terms of the Venn Diagram, this means we are no longer interested in the entire Venn Diagram, only on the part where Pizza is a 'Yes'. Since we only care about this area now, we want that the probabilities over liking Beer or not to add to one. The formula now makes sense, we have that $$ P[Beer=Yes|Pizza=Yes] = \frac{P[Beer=Yes,Pizza=Yes]}{P[Pizza=Yes]} $$ and $$ P[Beer=No|Pizza=Yes] = \frac{P[Beer=No,Pizza=Yes]}{P[Pizza=Yes]} $$ As in the main text we need to divide by $P[Pizza=Yes]$ so that these probabilities sum to one.

To actually use the formula, we need the marginal probabilities. This is of course easy in a problem such as this, we have that $$ P(\text{Likes Pizza}) = P(\text{Likes Pizza and Beer})+P(\text{Likes Pizza but not Beer})$$ which is equal to $0.2 + 0.5 = 0.7.$ We can easily read this from the Venn Diagram or read across the 'Yes' row for Pizza in the table.

Now plugging in the numbers from the Venn Diagram or table we have that $$ P[Beer=Yes|Pizza=Yes] = \frac{0.2}{0.7} = \frac{2}{7} $$ and $$ P[Beer=No|Pizza=Yes] = \frac{0.5}{0.7} = \frac{5}{7}. $$

Going through the same steps allows us to work out the reverse conditional probability, i.e. that $$ P[Pizza=Yes|Beer=Yes] = \frac{0.2}{0.4} = \frac{1}{2} $$ and $$ P[Pizza=No|Beer=Yes] = \frac{0.2}{0.4} = \frac{1}{2}. $$

Clearly we have that $P[Beer=Yes|Pizza=Yes] \ne P[Pizza=Yes|Beer=Yes]$, the ordering of the conditional distributions does matter. This is because they are completely different objects, they are probabilities of very different outcomes. One of the mistakes people make when casually thinking about probability is to not pay attention to how the thing we are conditioning on is correct relative to how we understand what we are doing.

This problem is so common it has plenty of names. In law it is called the prosecutors fallacy. The idea here is that from data we might understand from say DNA evidence and previous trials the $P(\text{DNA Evidence|Innocent})$ but in the trial, if there is evidence, we want the $P(\text{Innocent|DNA Evidence})$. The first of these probabilities might be very small, but that does not mean the reverse conditioning statement we care about is also small. In medicine it is known as "Zebras", which at least is a more interesting name. But it is the same problem, mistaking the conditional probability statement for its reverse. The reason for the name is that it is taught to medical students, when seeing a patient testing positive for a rare disease, that "If you see zebras, you hear hoofbeats but if you hear hoofbeats it probably is not a zebra". Medical students in the Serengeti probably have a different name for this mistake.

One additional name that is worth going through is what is known as the 'Base Rate Fallacy'. The base rate fallacy is mathematically similar to the issues we have been going through, but common enough that an additional example is useful. During covid there was a lot of misinformation. One that was prevalent was that half of new cases were amongst the vaccinated, and half of new cases were amongst the unvaccinated, so vaccines had no effect (the probabilities are the same). This is a classic example of the base rate fallacy - not taking into account that most people are vaccinated. Suppose the following (made up for the purposes of this example) table was correct.

Covid\Vaccinated No Yes
No $ 0.05 $ $ 0.85 $
Yes $ 0.05 $ $ 0.05 $

The issue is that those falling into the trap of the fallacy see that of the people with covid, we have equal probabilities of beiing in the vaccinated and the unvaccinated groups. This is just saying that $P[\text{Covid and Vaccinated}] = P[\text{Covid and Unvaccinated}] = 0.05$ and mistakenly believing then that these two probabilities are what indicates the relative chances. The issue is that many more people are vaccinated, and these are not the probabilities that we really want to compare. If you want to say that the chances are the same for vaccinated and unvaccinated, what we really want to compare is $P[\text{Covid | Vaccinated}]$ and $P[\text{Covid | Unvaccinated}]$. So we want to compare these conditional probabilities. They are easy to work out with our formulas, we have $$ P[Covid=Yes|Vaccinated=Yes] = \frac{P[Covid=Yes,Vaccinated=Yes]}{P[Vaccinated=Yes]} = \frac{0.05}{0.9} = \frac{1}{18} $$ and $$ P[Covid=Yes|Vaccinated=No] = \frac{P[Covid=Yes,Vaccinated=No]}{P[Vaccinated=No]} = \frac{0.05}{0.1} = \frac{1}{2} $$ These are clearly not the same, being vaccinated vastly lowers the chance of covid in this example. We can see where the 'base rate' part of the name comes from, it is ignoring that the probabilities of being vaccinated and not vaccinated are very different. But the real problem is mistakenly comparing joint probabilities instead of using the correct conditional probability.

For actual calculations we do not really need Bayes rule, but it really helps in understanding the problem of mistakes when we consider conditional probabilities and mistakes on the thing we are conditioning on. Recall that the general formula for Bayes rule is $$ P[Y=y|X=x] = P[X=x | Y=y] \frac{P[Y=y]}{P[X=x]}. $$ We can see from the formula why we might have $ P[Y=y|X=x]$ different from $P[X=x | Y=y]$. They will be very different when $\frac{P[Y=y]}{P[X=x]}$ is far from one, i.e. when the marginal probabilities of the two events are very different.

Consider the Pizza/Beer example. The marginal probability for liking Pizza is 0.7, whilst the marginal probability of liking beer is 0.4. Hence this ratio is far from one, so we get that $P[Beer=Yes|Pizza=Yes] \ne P[Pizza=Yes|Beer=Yes]$. We can work out the conditional probabilities directly as we have full information about the joint probabilities, but using Bayes rule we have that $$ P[Pizza=Yes|Beer=Yes] = P[Beer=Yes|Pizza=Yes] \frac{P(Pizza=Yes)}{P(Beer=Yes)} =\frac{2}{7}\frac{0.7}{0.4}=\frac{1}{2} $$ so gives a result of one half as we calculated directly earlier.

The prosecutors fallacy or 'Zebras' is because in these types of examples the the ratio $P[X=x | Y=y]$ is not only not equal to one usually but often very far from one. This is because one of the probabilities is often very small, like having a rare disease.