In 1981, Daniel Kahneman, along with his long-time sidekick Amos Tversky, wrote a study about a very smart experiment they did that exposed the lack of attention people give base-rates in quick mental probability calculations. It is now quite famous, and you might come across it in a mathematics, economics or psychology course these days.

The Experiment

A cab was involved in a hit and run accident at night. Two cab companies, Green and Blue, operate in the city. You are given the following data:

  • 85% of the cabs in the city are Green and 15% are Blue.
  • A witness identified the cab as Blue. The court tested the reliability of the witness under the circumstances that existed on the night of the accident, and concluded that the witness correctly identified each one of the two colors 80% of the time and failed 20% of the time.

What is the probability that the cab involved in the accident was Blue rather than Green?

Have a think about how you would respond, given that you don’t have the time to sit there and work it out. What sort of percentage comes to you intuitively?

If you thought the likelihood of the cab being Blue was somewhere around 80%, you were in line with the most common responses. Unfortunately, however, you were also incorrect!

What the experiment showed was that the majority of people ignore the ‘base rate’, which is the ‘prior’ information or, if you like, sort of like the denominator of a fraction. People just tend to look at the ‘numerator’ part and consider that when making up their minds about likelihood. It’s almost like seeing a guy on the train with night vision goggles and then being asked ‘is that guy more likely to be an MI6 agent or an accountant?’. The circumstance makes the former answer seem more plausible in your mind, but you still have to consider that there are far more accountants than secret agents. Also, I don’t think a secret agent would be that conspicuous…

The correct answer is that the probability of the cab being Blue is 41%.

Doing the math

Getting the true answer is pretty difficult if you don’t have a pen and paper, as well as some familiarity with probability theory or a good sense of logic. But hopefully you will agree that if you had taken into account that 85% of the cabs were Green to begin with, this should reduce the value of any ‘Blue’ observations there have been.

If you do know probability theory, you will recognise this problem as a Bayesian one. You can use Bayes’ Rule and work out the answer pretty quickly. However, I don’t really like using formulae because they aren’t always logically transparent. I never really know whether I can trust the result, whereas if I’ve used logical steps and reasoning, I can check for plausibility along the way. This way, it’s easier to understand and also easier to explain to someone else. Hence, I’m going to show you how to get the answer above but by using a tree diagram to work out the conditional probabilities instead. It’s longer than plugging numbers into a formula, but it should make more sense.

cab1

Let’s start with the cab in question. Obviously, the first bit of information tells us the cab is 85% likely to be Green (G) and 15% likely to be Blue (B), i.e. probabilities of 0.85 and 0.15 respectively.

Suppose the cab was G. This is the case 0.85 of the time. Now the witness can either be correct or incorrect about his observation, with probabilities 0.8 and 0.2 respectively. If he’s correct, he would have said ‘Green’ and if he was wrong, he would have said ‘Blue’. To get the probability of a sequence of events, you multiply them together. So he would say Green with probability 0.85 x 0.8 = 0.68, and Blue with 0.85 x 0.2 = 0.17.

We can repeat this same analysis for the case when the cab is actually Blue. All we do then is replace the 0.85 with 0.15 in our calculations. This might sound complicated, but the diagram below should help:

cab2

We can check that this is correct by making sure the probabilities at the ends of our branches all sum up to 1. This is because we have exhausted all possible cases that can occur. 0.68 + 0.17 + 0.12 + 0.03 = 1, so we’re all good.

Now we have to answer the actual question: given what we know, what is the probability that the cab was Blue?

Recall that the witness said he saw a Blue cab. This corresponds to our 2 inner branches. He could have seen a Blue cab and been correct, or he could have actually seen a Green cab and been wrong. So given that he said he saw a Blue cab, the ratio of it being Blue to Green is 0.12:0.17 or 12:17. Turning these into probabilities by summing them (29) and dividing by the sum, we have that the cab is actually Blue 12/29 of the time. 12/29 = 0.41379… which rounds to 41%, and we have the answer.

With this diagram, we can answer other questions about this scenario if we want to. As an example, suppose the witness said he saw Green instead. Now what is the probability the cab is Blue? Simple: the ratio is 3:68 so the probability of it being Blue is 3/71 = 4%. And here we see the power of the base rate. The initial proportion of taxis is vital in these calculations, but is easily overlooked when we’re making a quick judgement.