The other day I signed up to become a seer. Over the next four years, I hope to learn to make better forecasts — political, economic, cultural, maybe even personal. Along with a few thousand other people, I will “cultivate the fine art of well-calibrated judgment,” in the words of Philip Tetlock, a psychologist at the Wharton School at the University of Pennsylvania. How I will hone my skills, and how well, I can’t predict. I have no clairvoyant talent. But then Tetlock’s premise in this venture is that today no one really knows how to make sound predictions.
That’s not what leaders want to hear, of course, because their work, as the late British politician Wayland Young once said, is “the art of taking good decisions on insufficient evidence.” Naturally, leaders have always wanted to tame the future’s uncertainty. In the ancient world, sovereigns and generals didn’t make a move until they had read tea leaves, or bird entrails, or cloud formations, or, of course, their horoscopes. In our time, things are supposed to be different, and forecasts now are touted as a combination of relevant data, logical analysis and special insight — born of experience, a powerful theory or some skilled math. Certainly, there are plenty of products offered in “the forecasting aisle in the supermarket of ideas,” as Tetlock calls it, and they all tout their rigor and power.
Yet every year brings shocks that contradict or surprise the experts. In 2011 alone, we’ve had the Arab world’s sequence of revolutions; the failure in Japan of nuclear-safety measures that were predicted to withstand all possible disasters; and the discovery of Osama bin Laden still wielding influence in a comfortable resort town, not a far-off cave. As with the financial meltdown of 2008 or the terrorist attacks of 2001, the clues proved easy to read — “in the rearview mirror, not through the windshield,” as General David Petraeus has said of turning points in war.
With all our digital-age information and analytic tools, why can’t we get more leverage on the future? Tetlock has been working on that question for a long time. Over two decades, starting in 1983, he and his colleagues collected 82,361 distinct predictions on political trends from 284 professional advisers and commentators. Comparing all these forecasts with actual events, the researchers ended up confirming the C-suite’s worst fears: Experts were no better at predicting what would happen in their field than they were at predicting what would happen in areas where they had no special expertise. In other words, when the metric was basic accuracy — did this happen or not? — expertise added no value. For any given field, from China to macroeconomics to diplomacy, an intelligent newspaper reader was as good as the pundit.
Then, too, the researchers often asked their experts to pick among three choices for a political or economic trend — more of it, the same as now, or less of it. That gave them a way to compare success rates with other means of picking among those options. The results were not inspiring. If you had used a very dull algorithm (for instance, “always predict more of the same”) or even if you had used a chimp tossing darts at a three-color target, you would have done better than the experts.
So it seems that today’s leaders, for all their white papers and PowerPoint projections, are no better off than the Roman consuls who relied on a haruspex and the oracle at Delphi.
The trouble is a combination of physics and psychology. The physical aspect, as Tetlock points out, is simply a consequence of complexity: In complex systems, each part is affected by the actions of many others in a constantly changing dynamic. Whether the complex system is the world’s weather or a market or a global industry, there is a point on the future horizon beyond which it’s simply impossible to calculate all the possibilities. That’s why forecasts about what is likely to happen tomorrow are more reliable than those about what will happen in six months.
But a much bigger problem arises from the shape of the human mind. We’re not good at seeing the future clearly, and even worse at dealing with the proof that we aren’t good at it.
For one thing, people have a bias in favor of the status quo. We tend to think that what has been will be, and that large-scale shocks are rare and nearly impossible. This is why conventional wisdom tends to predict incremental, reassuring forms of change: no shocks, no collapses, no extremists, no big surprises. (Decades later, of course, the results can look hilarious — like science fiction stories that foresaw videophones and rocketry for tourists, but imagined that these new technologies would be used by pipe-smoking dads and their 1950’s-style homemaker wives.) One particularly modern form of this familiarity bias is a strong preference for the normal distribution, in which the average of any set of measurements will be the most common and “outliers” will be at the long thin tails of the bell curve. We tell ourselves extreme events are rare. In fact, extreme events (like Pacific Rim earthquakes or recurrent revolts against dictators in the Mideast this year) can happen repeatedly, and in quick succession. It’s simply a fact of nature and of history that many events don’t have a normal distribution. But many of us are reluctant to give up that reassuring assumption.
We also tend to forget that for most of the predictions we care about — the state of the world economy in 2012, the political stability of any particular nation, the likely outcome of a war — the “objects” of the forecast are people. So when we estimate whether traffic on the Interstate will be heavy or light, for example, or whether “the market” is good for selling a stock, we have to predict what other people are predicting.
Whether you are a guerrilla trying to decide when to strike a military outpost, or a broker feeling out the mood of the exchange, or just someone trying to guess if your favorite bar will be too crowded, the University of Miami’s Neil Johnson says, your decision relies on what you know about everybody else’s choice. Predictions often fail because they don’t take into account the effect of other predictions. Tetlock cites the example of a contest run years ago by The Financial Times. Readers were asked to guess what number from 0 to 100 would equal 66 percent of the average guesses of all the other entrants. Assuming that other people will guess at random, the average number for 100 choices should be 50, and two-thirds of that is 33. Many readers offered that answer. They were way off, because they failed to take into account that other predictors were also trying to psych out the strategies of other people. The real number was lower.
Finding the right number required some knowledge about the minds of competitors. If every single one was well-versed in game theory, Tetlock points out, they would guess zero (knowing that everyone converges on 33 makes 22 the right answer; but if everyone converges on 22 then the right answer is 14.6 — and so on down to zip). But all participants are not equal in their expertise, so the theoretically right answer wasn’t accurate either. In fact, the correct answer turned out to be 18. That, Tetlock points out, was about halfway between the naïve answer (33) and the too-clever one (0).
Of course, experts are supposedly trained to resist the biases and blind spots of the ordinary mind. So why don’t they? Part of the reason is that they share other forms of prejudice with the rest of us. One, for example, is a strong unconscious preference for information that jibes with what we already believe. This is what psychologists call “confirmation bias”: What shores up our beliefs is easy to accept, while that which challenges them is easy to suspect.
Dan Kahan at Yale Law School, along with Donald Braman and Hank Jenkins-Smith, mapped how this works in a series of ingenious experiments around a sensible question: When there is a genuine scientific consensus on some topic, why do people simultaneously (a) say they believe in science and (b) continue to disagree about the facts?
To examine this, the researchers showed people the photos and credentials of various fictional scientific experts and described their views about various hot-button topics.
Firearms policy was one. If a state’s laws permit almost anyone to carry a concealed gun, liberals are likely to say that this will cause a rise in the crime rate, as people blast away at one another. Conservatives are likely to say it will cause a decrease in crime, as citizens will have the means to defend themselves.
Kahan and his colleagues worked out a way to sort out people according to basic values, which correspond pretty well to American conventions about who is left-leaning (egalitarian and communal values) and who inclines to the right (hierarchical and individualistic). When they gave their leftish volunteers some writing that showed “Prof. James Williams” had found that “concealed carry” laws lead to violence, 80 percent of the liberals agreed that Williams was a trustworthy expert, whose book they would recommend. When other lefties read a different passage in which Williams concluded that these laws decrease crime, half of them said he wasn’t trustworthy. On the other hand, more than 80 percent of conservatives who read that passage endorsed the sound expertise of the learned Dr. Williams.
Gun policy is a particularly nice test for ideological effects because, Kahan says, neither side is correct: According to a National Research Council review in 2004, “concealed carry” laws don’t have much effect on crime in either direction — they don’t increase it, and they don’t reduce it. But many people prefer to defend their worldview rather than accept that conclusion.
Another pitfall of human nature that affects experts is this: We favor explanations that fall together into a satisfying story. As Hollywood has long known, a convincing narrative trumps logic and common sense. In the prediction game, this makes a forecast appealing when it has multiple interlocking events — like “revolutionary political change in Congo leading to a spike in price for cobalt.” Intuitively, the mind gravitates to this sequence of cause and effect, and it is more convincing than a bare prediction of “revolution in Congo.” Logically, though, the occurrence of two separate events is less likely than the occurrence of one.
In the early 1980s, this effect emerged in an experiment with foreign policy experts, who had been divided into two groups. One group was asked for the probability of “a complete suspension of diplomatic relations between the U.S. and the Soviet Union, sometime in 1983.” The other was asked about chances of “a Russian invasion of Poland, and a complete suspension of diplomatic relations between the U.S. and the Soviet Union, sometime in 1983.” The second chain of events was judged more likely than the first, even though the second, requiring two separate events to happen, was certainly less likely according to basic probability theory.
Experts wrestling with these biases aren’t any better armed than others. In fact, their expertise can be a disadvantage, researchers say. That’s because experts can provide extra layers of resistance to correction.
Detailed knowledge can buttress overconfidence and resistance to contradictory or new evidence. When the amateur newspaper reader draws a blank on economic policy or Asian history, he or she has to admit that she doesn’t have all the facts. When you have many facts at your fingertips, though, you’re used to filling in blanks from memory. And memory is a flattering servant, happy to make up material.
A recent study, to be published this fall in the Journal of Consumer Research, has dubbed this phenomenon “expertise-induced false recall.” The authors (the business school professors Ravi Mehta of the University of Illinois at Urbana-Champaign, JoAndrea Hoegg of the University of British Columbia and Amitav Chakravarti of New York University) asked 113 undergraduates to compare the merits of two gaming consoles they had invented. The students had a chance to look at specs for the two devices, and then make their evaluations from memory. The researchers also asked the students to answer questions about how well they knew PlayStations, Xboxes and other aspects of gaming, so they could separate them later into “novice” and “expert” groups. Further, they made it plain to the volunteers that other people would use their advice in making a console purchase.
The knowledgeable students responded differently from their novice peers. Specifically, they were more likely to mistakenly say that both consoles shared a feature that belonged only to one. They were also more likely to say a console had a feature that neither one had. Their mistakes seemed to be a product of expert knowledge (I know this stuff, I don’t need to check) and expert responsibility (if I am comparing two items, they had better be comparable, or I can’t use my expertise). When the experiment was repeated without the element of responsibility, so that the “experts” weren’t worried that people would rely on their advice, the proportion of false memories decreased.
These experiments suggest that the savviest and most knowledgeable people, then, are the ones with the most resources needed to fool themselves. As one Victorian historian said of another, “his capacity for synthesis, and his ability to dovetail the various parts of the evidence ... carried him into a more profound and complicated elaboration of error than some of his more pedestrian predecessors.”
On top of the cognitive tricks that the mind can play on itself, there are also social motives for defending one’s forecasts instead of fixing them. Tetlock, for one, argues that forecasters’ incentives are aligned against testing and improvement. If you want to be hired for a job, you don’t announce that no one seems to be very good at it. Instinctively, Tetlock believes, people in the prediction business hedge their bets — setting themselves up to look as if they are bolder and edgier than they really are. The trick is to make a bold proclamation — “China will fissure into separate nations by 2050” — and meld it to an explanatory apparatus that leaves you with a way out if you’re wrong. (“It will fissure unless these steps are taken,” or “it will fissure given current trends, though things could change.”) This two-step dance serves the consumer of forecasts, by upholding the reassuring notion that forecasts are reliable. And it serves the producers by giving them a way to stay in business even when they’re wrong, Tetlock argues.
All of this can sound like a recipe for despair, but it needn’t be. It can be a recipe for a line of useful work: Once we admit that forecasting as we know it doesn’t work, we can clear the slate and try to see what does. That’s the point of the research that Tetlock and his colleagues have set up as the Good Judgment Project.
Over the next four years, more than 2,000 members of the project will make 100 forecasts a year, answering questions like “How many nations in the euro zone will default on bonds in 2011?” and “Will southern Sudan become an independent nation this year?” Researchers on the project, which is run by Tetlock and the psychologists Barbara Mellers and Don Moore, will evaluate the forecasts, looking for ways that individual judgments can be most effectively combined into the best form of “wisdom of crowds.” They will also train individuals in different forecasting approaches. That way, as time passes, they’ll be able to compare and contrast the success rates of different techniques.
The team will be competing against others in a tournament of forecasts, sponsored by the United States Defense Department’s Intelligence Advanced Research Projects Activity. The aim is simply empirical: at the collective level, find the best ways to combine individual perspectives and arrive at a prediction; at the individual level, find the best methods of training predictors to be better at their art. Amazingly, nothing like this has ever been attempted. Where will it lead? I find it hard to say, though I hope, four years hence, to have a better idea of how to make a guess.
David Berreby (email@example.com) is the author of “Us and Them: The Science of Identity” (University of Chicago Press, 2008). He writes the Mind Matters blog for Bigthink.com and has written about the science of behavior for a number of leading publications.