Sunday, May 1, 2016

Predicting, forecasting, and superforecasting


I have expressed a lot of reservation about the feasibility of prediction of large, important outcomes in the social world (link, link, link). Here are a couple of observations drawn from these earlier posts:
We sometimes think that there is fundamental stability in the social world, or at least an orderly pattern of development to the large social changes that occur.... But really, our desire to perceive order in the things we experience often deceives us. The social world at any given time is a conjunction of an enormous number of contingencies, accidents, and conjunctures. So we shouldn't be surprised at the occurrence of crises, unexpected turns, and outbreaks of protest and rebellion. It is continuity rather than change that needs explanation. 
Social processes and causal sequences have a wide range of profiles. Some social processes -- for example, population size -- are continuous and roughly linear. These are the simplest processes to project into the future. Others, like the ebb and flow of popular names, spread of a disease, or mobilization over a social cause, are continuous but non-linear, with sharp turning points (tipping points, critical moments, exponential takeoff, hockey stick). And others, like the stock market, are discontinuous and stochastic, with lots of random events pushing prices up and down. (link)
One reason for the failure of large-scale predictions about social systems is the complexity of causal influences and interactions within the domain of social causation. We may be confident that X causes Z when it occurs in isolated circumstances. But it may be that when U, V, and W are present, the effect of X is unpredictable, because of the complex interactions and causal dynamics of these other influences. This is one of the central findings of complexity studies -- the unpredictability of the interactions of multiple causal powers whose effects are non-linear. 
Another difficulty -- or perhaps a different aspect of the same difficulty -- is the typical fact of path dependency of social processes. Outcomes are importantly influenced by the particulars of the initial conditions, so simply having a good idea of the forces and influences the system will experience over time does not tell us where it will wind up. 
Third, social processes are sensitive to occurrences that are singular and idiosyncratic and not themselves governed by systemic properties. If the winter of 1812 had not been exceptionally cold, perhaps Napoleon's march on Moscow might have succeeded, and the future political course of Europe might have been substantially different. But variations in the weather are not themselves systemically explicable -- or at least not within the parameters of the social sciences.
Fourth, social events and outcomes are influenced by the actions of purposive actors. So it is possible for a social group to undertake actions that avert the outcomes that are otherwise predicted. Take climate change and rising ocean levels as an example. We may be able to predict a substantial rise in ocean levels in the next fifty years, rendering existing coastal cities largely uninhabitable. But what should we predict as a consequence of this fact? Societies may pursue different strategies for evading the bad consequences of these climate changes -- retreat, massive water control projects, efforts at atmospheric engineering to reverse warming. And the social consequences of each of these strategies are widely different. So the acknowledged fact of global warming and rising ocean levels does not allow clear predictions about social development. (link)
When prediction and expectation fail, we are confronted with a "surprise".
So what is a surprise? It is an event that shouldn't have happened, given our best understanding of how things work. It is an event that deviates widely from our most informed expectations, given our best beliefs about the causal environment in which it takes place. A surprise is a deviation between our expectations about the world's behavior, and the events that actually take place. Many of our expectations are based on the idea of continuity: tomorrow will be pretty similar to today; a delta change in the background will create at most an epsilon change in the outcome. A surprise is a circumstance that appears to represent a discontinuity in a historical series. 
It would be a major surprise if the sun suddenly stopped shining, because we understand the physics of fusion that sustains the sun's energy production. It would be a major surprise to discover a population of animals in which acquired traits are passed across generations, given our understanding of the mechanisms of evolution. And it would be a major surprise if a presidential election were decided by a unanimous vote for one candidate, given our understanding of how the voting process works. The natural world doesn't present us with a large number of surprises; but history and social life are full of them. 
The occurrence of major surprises in history and social life is an important reminder that our understanding of the complex processes that are underway in the social world is radically incomplete and inexact. We cannot fully anticipate the behavior of the subsystems that we study -- financial systems, political regimes, ensembles of collective behavior -- and we especially cannot fully anticipate the interactions that arise when processes and systems intersect. Often we cannot even offer reliable approximations of what the effects are likely to be of a given intervention. This has a major implication: we need to be very modest in the predictions we make about the social world, and we need to be cautious about the efforts at social engineering that we engage in. The likelihood of unforeseen and uncalculated consequences is great.  
And in fact commentators are now raising exactly these concerns about the 700 billion dollar rescue plan currently being designed by the Bush administration to save the financial system. "Will it work?" is the headline; "What unforeseen consequences will it produce?" is the subtext; and "Who will benefit?" is the natural followup question. 
It is difficult to reconcile this caution about the limits of our rational expectations about the future based on social science knowledge, with the need for action and policy change in times of crisis. If we cannot rely on our expectations about what effects an intervention is likely to have, then we can't have confidence in the actions and policies that we choose. And yet we must act; if war is looming, if famine is breaking out, if the banking system is teetering, a government needs to adopt policies that are well designed to minimize the bad consequences. It is necessary to make decisions about action that are based on incomplete information and insufficient theory. So it is a major challenge for the theory of public policy, to attempt to incorporate the limits of knowledge about consequences into the design of a policy process. One approach that might be taken is the model of designing for "soft landings" -- designing strategies that are likely to do the least harm if they function differently than expected. Another is to emulate a strategy that safety engineers employ when designing complex, dangerous systems: to attempt to de-link the subsystems to the extent possible, in order to minimize the likelihood of unforeseeable interactions. (link)
One person who has persistently tried to answer the final question posed here -- the conundrum of forming expectations in an uncertain world as a necessary basis for action -- is Philip Tetlock. Tetlock's decades-long research on forecasting and judging is highly relevant to this topic. The recent book Superforecasting: The Art and Science of Prediction provides an excellent summary of the primary findings of the research that he and senior collaborators have done on the topic.

Tetlock does a very good job of tracing through the sources of uncertainty that make projections and forecasts of the future so difficult. The uncertainties mentioned above all find discussion in Superforecasting; and he supplements these objective sources of uncertainty with a volume of recent work on cognitive biases leading to over- or under-confidence in a set of expectations. (Both Daniel Kahneman and Scott Page find astute discussions in the book.)

But in spite of these reasons to be dubious about pronouncements about future events, Tetlock finds that there are good theoretical and empirical reasons for believing that a modest amount of forecasting of complex events is nonetheless possible. He takes very seriously the probabilistic nature of social and economic events, so a forecast that "North Korea will perform a nuclear test within six months" must be understood as a probabilistic statement about the world (there is a specific likelihood of such a test in the world); and a Bayesian statement about the forecaster's degree of confidence in the prediction. And good forecasters aim to be specific about both probabilities: for example, "I have a 75% level of confidence that there is a 55% likelihood of a North Korean nuclear test by date X".

Moreover, Tetlock argues that it is possible to evaluate individual forecasters on the basis of their performance on specific tasks of forecasting and observation of the outcome. Tetlock would like to see the field of forecasting to follow medicine in the direction of an evidence-based discipline in which practices and practitioners are constantly assessed and permitted to improve their performance. (As he points out, it is not difficult to assess the weatherman on his or her probabilistic forecasts of rain or sun.) The challenge for evaluation is to set clear standards of specificity of the terms of a forecast, and then to be able to test the forecasts against the observed outcomes once the time has expired. This is the basis for the multiple-year tournaments that the Good Judgment Project has conducted over several decades. The idea of a Brier score serves as a way of measuring the accuracy of a set of probabilistic statements (link). Here is an explanation of "Brier scores" in the context of the Good Judgment Project (link); "standardized Brier scores are calculated so that higher scores denote lower accuracy, and the mean score across all forecasters is zero". As the graph demonstrates, there is a wide difference between the best and the worst forecasters, given their performance over 100 forecasts.


So how is forecasting possible, given all the objective and cognitive barriers that stand in the way? Tetlock's view is that many problems about the future can be broken down into component problems, some of which have more straightforward evidential bases. So instead of asking whether North Korea will test another nuclear device by November 1, 2016, the forecaster may ask a group of somewhat easier questions: how frequent have their tests been in the past? Do they have the capability to do so? Would China's opposition to further tests be decisive?

Tetlock argues that the best forecasters do several things: they avoid getting committed to a single point of view; they consider conflicting evidence freely; they break a problem down into components that would need to be satisfied for the outcome to occur; and they revise their forecasts when new information is available. They are foxes rather than hedgehogs. He doubts that superforecasters are distinguished by being of uniquely superior intelligence or world-class subject experts; instead, they are methodical analysts who gather data and estimates about various components of a problem and assemble their findings into a combined probability estimate.

The author follows his own advice by taking conflicting views seriously. He presents both Daniel Kahneman and Nassim Taleb as experts who have made significant arguments against the program of research involved in the Good Judgment Project. Kahneman consistently raises questions about the forms of reasoning and cognitive processes that are assumed by the GJP. More fundamentally, Taleb raises questions about the project itself. Taleb argues in several books that fundamentally unexpected events are key to historical change; and therefore the incremental forms of forecasting described in the GJP are incapable in principle of keeping up with change (The Black Swan: Second Edition: The Impact of the Highly Improbable: With a new section: "On Robustness and Fragility" (Incerto) as well as the more recent Antifragile: Things That Gain from Disorder). These are arguments that resonate with the view of change presented in earlier posts and quoted above, and I have some sympathy for the view. But Tetlock does a good job of establishing that the situation is not nearly so polarized as Taleb asserts. Many "black swan" events (like the 9/11 attacks) can be treated in a more disaggregated way and are amenable to a degree of forecasting along the lines advocated in the book. So it is a question of degree, whether we think that the in-principle unpredictability of major events is more important or the incremental accumulation of many small causes is a preponderance of historical change. Processes that look like the latter pattern are amenable to piecemeal probabilistic forecasting.

Tetlock is not a fan of pundits, for some very good reasons. Most importantly, he argues that the great majority of commentators and prognosticators in the media and cable news are long on self-assurance and short on specificity and accountability. Tetlock argues several important points: first, that it is possible to form reasonable and grounded judgments about future economic, political, and international events; second, that it is crucial to subject this practice to evidence-based assessment; and third, that it is possible to identify the most important styles, heuristics, and analytical approaches that are used by the best forecasters (superforecasters).

(Here is a good article in the New Yorker on Tetlock's approach; link.)

1 comment:

zbicyclist said...

It's hard for me to see 9/11 as a Black Swan event. Al Queda had carried out several successful terrorist events in different foreign countries before this. There had been large terrorist events in the United States before this (e.g. the bombing of the federal building in Oklahoma by domestic terrorists).

Neither the notion that Al Queda would stage another terrorist event OR that there would be another terrorist event in the United States was beyond comprehension at the time.