You want to go out for a walk this afternoon, but you’re worried that it might rain. You turn on the television: the forecast is for rain. Should you give up on your walk?
You decide to do a little research. You go to the weather forecaster’s website and discover that they claim a 90% accuracy rate: out of 100 days on which it rained, they predicted it would rain on 90 of those days. Sounds pretty good.
Digging a little deeper you discover that out of 100 days on which it did not rain, they correctly predicted it would be dry on 80 of those days. That’s not too bad, either.
It looks like the forecaster is pretty reliable. You decide to go ahead with your walk but you take an umbrella with you, trusting the forecast of rain.
It’s bright sunshine the whole time! You didn’t need the umbrella at all!
Because you didn’t use Bayes theorem.
You see, it turns out that it rains only 10% of the time where you live. So in 100 days, it rains on 10 of those days. And the weather forecaster, with its 90% accuracy rate, would correctly predict rain on 9 of those 10 days.
However, it doesn’t rain on 90 out of 100 days. But the weather forecaster would wrongly predict that it would rain on 20% of these. So on 18 days the forecast would be for rain when it didn’t actually rain.
In total then, the weather forecaster predicts rain on 9 + 18 = 27 days out of 100. But on only 9 of those days does it actually rain. So the proportion of days on which it rains when the weather forecaster has predicted rain is 9/27, which is only one third. That’s pretty unreliable.
The impressive statistic (“90% accuracy!”) on the weather forecaster’s website was the answer to the following question: “Given that it did in fact rain, what is the probability that the forecast was for rain?”
The problem arose because this question is the wrong way round. What you really want to know is, “Given that the forecast is for rain, what is the probability that it will actually rain?” The statistic here is much less impressive: about 33%.
Why did this happen?
Although the weather forecaster often correctly predicts rain when it actually rains, it doesn’t rain very often, so the number of days on which it rains and on which rain is predicted is small (9 days). And although the weather forecaster rarely predicts rain when it doesn’t rain, there are many days on which it doesn’t rain, so there are many opportunities for an incorrect forecast (18 days out of 100).
Thus a prediction of rain is more often associated with a dry day than with a wet day. And that’s what happened to you today.
A similar problem arises in diagnostic testing for diseases: for rain read ‘disease’, for forecast read ‘diagnostic test’. Bayes theorem says that the question of interest is “Given that the test is positive, what is the probability that the patient actually has the disease?”
There are two things we wish to avoid. A false positive occurs when a healthy patient is diagnosed as having the disease. (Statisticians creatively call these Type I errors.) A false negative occurs when a patient with the disease is diagnosed as being healthy. (Statisticians creatively call these Type II errors.)
Pregnancy isn’t a disease, but the picture below illustrates the distinction between the two types of error.
The answer to our question – “Given that the test is positive, what is the probability that the patient actually has the disease?” – is the ratio of ‘the number of sick patients who get a positive test result’ to ‘the number of patients (both sick and healthy) who get a positive rest result’. (If you like: true positives divided by all positives.)
For this ratio to be high (i.e. for the diagnostic test to be reliable) we need the number of false positives to be very low.
For example if we have 10 true positives and 1 false positive, then the proportion of true positives is 10/11, which is very high. But if we have 10 true positives and 10 false positives, then the proportion is 10/20, which is no better than diagnosis by tossing a coin!
Problems arise when the base rate of the disease amongst people who are tested is low. In a screening programme for a rare disease, even a low rate of false positives will throw up a large number of positive test results, because so many of the people tested will be healthy and a small proportion of a large amount is still a reasonable number of people, all of whom will be wrongly diagnosed. And even if the test is very good at identifying sick people, the actual number of sick people is low (because the disease is rare) so that number of true positives may not be very high. Thus the ratio true positives to all positives may, therefore, not be very high, as in my rain example.