Mark Twain famously remarked that there are “lies, damn lies, and statistics.” It’s a quote a lot of people like to invoke when statistics either don’t line up with public perceptions, or when they seem to muddy the waters, making it more difficult to understand a given phenomenon. We are bombarded with statistics—about the COVID-19 pandemic, about the upcoming election, about climate change, in marketing materials, and so on. But how do we understand this cascade of numbers? For the most part, we don’t. “Shock” headlines are partly to blame, statistics used without context to obscure rather than reveal the complete picture.
By way of example, let’s look at a common statistic, which is often used by insurance companies. “3 out of every 4 accidents happen within 15 miles of your home.” Yikes! This sounds as if we are LESS safe driving around our hometowns than when we are on vacation. After all, only 1 in 4 accidents happen away from home—so vacation driving is safer than driving around home, right? Not necessarily. We can get a misleading picture if we look at these numbers in isolation. In the data world, this is called the “base rate fallacy.”
In probability and statistics, context is often provided by invoking the base rate, or the underlying probability unconditioned by prior events. In the car accident example, the base rate is the total percentage of driving within 15 miles of your home. 3 out of 4 accidents may happen near the home, but how much time do we actually spend near our homes? About 95% for most people! Most accidents happen near our homes because that’s where we spend the vast majority of the time. Given the odds of an accident are only 75% in an area you spend 95% of your time, driving close to home is clearly many, many times safer than driving away from it.
So why does this matter? Well, ignoring the base rate may lead you to the wrong conclusion, as in the car accident example. These days, you can’t open a newspaper or listen to a television broadcast without being immediately confronted with statistical claims about the health crisis, the direction of the economy, political trends, etc. The vast majority of the time these statistics are presented without the base rate, which renders them virtually meaningless. We understand that it’s not realistic to expect that news media explain statistical principles in every article. But these numbers dictate policy, which in turn affects all of our lives. What can we do? We can become more savvy data readers. In an era when we’re constantly confronted with numbers, percentages, flow charts, and the like, we need to look more deeply into their context, and be ever more careful about the conclusions we draw. Because just like words, numbers can be deceiving.