The Fine Print of Pandemic Models: Realism and Utility
Updated: a day ago
There are two dominant types of dynamic models for projecting the progression of viral outbreaks. They also enable policy-makers to assess the likely effects of public health measures in reducing the rate of new infections.
SIR models project viral spread based on two key inputs: the rates of “flow” of individuals in a population from a Susceptible state to an Infected one and from there to a Recovered state. These models combine the estimated impact of proposed social isolation and distancing measures into a single number that throttles the rate of infection “valve” controlling the flow from S to I.
The second technique, Agent-Based Models (ABMs), simulates a population of individuals as they move about and engage in work, education, and other social activities. The projected spread of the virus emerges over time from simulating encounters between individuals given an estimated probability of exposure from contact with infected persons. In contrast to SIR models, ABMs treat each public health measure as an independent change in behavior that restricts an agent’s pattern of movements. Agents decide which measures to adopt or ignore. Taken collectively, agents’ decisions reduce the frequency and intensity of social interactions, reducing exposure to the virus, and thus, the rate of infection.
Both models inform government policy decisions by predicting the consequences of doing nothing or intervening to try to control the course of the viral outbreak. There is, of course, no free lunch. Statistician George Box observed that “all models are wrong, but some are useful.” In other words, models only approximate reality. SIR and agent-based models are no exception; their projections of viral spread and the impacts of public health policies on that spread are imperfect. Deviations that we observe between model projections and reality arise from two sources: the mathematical “mechanics” of models and their dependencies on data inputs and assumptions. This article reviews three important issues that condition or constrain the usefulness of pandemic models to government decision-makers: realism, quality of data inputs, and scope.
The basic SIR model makes numerous assumptions to simplify the equations governing flow. The population remains constant in size (= S + I + R - deaths due to the virus). All members of S are equally susceptible to viral infection. The mixing process, or pattern of interaction between individuals in S and I states, is treated as uniform and constant over time. Susceptible persons who are exposed to the virus transition to I and become contagious immediately. Individuals who recover are permanently immune to reinfection. Similarly, the simplest ABMs rely on a limited and stylized set of agent behaviors and decisions to simulate social interactions (random motion and collisions within some fixed region) and self-isolation (motionlessness).
Both models are often enhanced, relaxing these assumptions to improve realism. SIR models can be extended into SEIR models by adding a fourth state (or compartment) to distinguish Exposed from Infected individuals and adding a rate of flow between E and I that incorporates a time lag to represent the incubation period of the virus. SIRS models extend SIR by adding a flow from R back to S to represent the fact that recovery from some viruses confers only temporary immunity, owing to seasonal variations or mutations in viruses and/or impermanent adaptation by human immune systems. Similarly, ABMs can be enhanced by extending the repertoire of agent behaviors (e.g., work, commuting, travel, shopping, recreation), and decisions to adopt various public health measures that alter those behaviors. Additionally, the behaviors and decision triggers assigned to agents can be variable, conditioned on agent properties such as age, health condition, occupation, race, types of residence, and social roles.
Extending models of viral contagion heightens realism, but also incurs costs for increased complexity: more assumptions, increased effort to gather data inputs, programming more complicated agent behaviors, and greater computational burdens to solve more complex flow equations or simulate large, diverse populations of agents. Decision-makers must understand a pandemic model’s degree of realism—which aspects of the dynamics of viral contagion they account for and which they ignore or finesse—before trusting them to bet the public welfare on their predictions.
Quality of Data Inputs
All models are vulnerable on this account—or as software programmers put it, Garbage In, Garbage Out. Pandemics exacerbate this problem. Recognition of the emergence of a novel virus that attacks humans tends to be slow and diagnostic tests aren’t yet available, delaying the collection of crucial data as an outbreak begins. Gathering reliable data later, during the crush of viral surges, is equally problematic, as nursing homes, and established and makeshift hospitals and morgues are overwhelmed with patients and bodies. COVID-19 aggravated these problems further: many infected individuals are asymptomatic while contagious, many countries even now have serious shortages of diagnostic tests and antibody tests to verify recovery are still maturing. Many diagnostic tests are known to be highly inaccurate, yielding as many as 30% false-negative results. Many antibody tests still lack formal certification for quality, and, more seriously, the correlation between antibodies and actual immunity (and its duration) is still unknown. These factors conspire to compromise estimates for key input parameters such as compartment sizes and the rates of infection and recovery for flow models like SIR.
Fortunately, problems of data quality, completeness, and uncertainty are far from novel; several techniques are available that help to compensate for many of them. One approach is to base decisions on outputs from multiple independent models from credible sources. Consistent projections of similar infection curves (and the effects of interventions), despite minor variations in data sets and assumptions, bolster trust.
Alternatively, analysts can produce multiple independent projections of viral spread by varying assumptions and input data to a single model. The simplest variant manipulates the source data to produce two additional data sets, one representing the best case (e.g., small I population, low rate of infection, and high rate of recovery for a SIR model) and the other, a worst-case. Solving the SIR equations (or running ABM simulations) for all three sets of inputs generates an “envelope” around the base case that clearly conveys the gaps and imprecision in key data inputs. Additionally, policy decisions can be tested
and validated across the envelope−using best, base, and worst case assumptions−to build confidence in the model and recommended measures despite serious uncertainty.
A more sophisticated approach leverages the Monte Carlo simulation technique. Monte Carlo models replace point estimates for values of key parameters with statistical distributions estimated from available data about the current and prior pandemics. For example, a singular estimated probability of exposure given contact might be converted into a normal (Gaussian distribution) with a mean of 0.3 and a standard deviation of 0.05. Next, the pandemic model is solved or simulated tens or hundreds of thousands of times, using input values for the uncertain parameters that are sampled at random from their associated distributions. This produces a distribution of outcomes−alternative projected chronologies of the viral spread rather than one or several discrete forecasts. Many retirement calculators use Monte Carlo techniques to project whether your financial portfolio will generate sufficient income for your needs until death despite uncertainty about your portfolio’s annual rates of return and inflation in the future. These calculators use variable inputs sampled annually from distributions for rates of return and inflation over your assumed lifespan rather than relying on average values for those rates across those years. This is important because investment gains (or losses) and inflation compound annually: above or below average values early in the game exert disproportionate influence on portfolio sizes over time. Viral infections also grow non-linearly; the total numbers of infections and deaths projected for a particular day in the future are highly sensitive to variations in input parameters and values from previous days. You can also calculate the percentage of times that Monte Carlo simulations produce specific results from their output distributions. For example, a given set of public health measures produces projections for total infections and deaths falling below 30000 and 2500 four weeks from now in 78% of the runs.
Thus, Monte Carlo provides a more prudent basis for making pandemic decisions than betting on a single set of model input values−one roll of the dice—to predict outbreak statistics.
Epidemiological models inform policy decisions to manage the public health dimensions of pandemic crises by forecasting future numbers of infections, hospitalizations, and deaths. However, this focus on disease statistics doesn’t offer much insight into the broader impacts of viral outbreaks on the economics and security of communities, states, or nations. For starters, many of those exposed or infected are front-line health care workers such as nurses and doctors or first responders: health care capacity degrades as these individuals must self-quarantine or, worse still, seek treatment that further burdens limited medical resources. Similar disruptions to capacity threaten our national supply chain for food, as workers at meat processing plants and grocery stores become infected.
Pandemic models are largely mute on these corollary societal consequences of viral statistics, at least on their own. One way to approach this crucial shortcoming is to connect pandemic simulations with other analytical models. For example, one simulation model helps hospitals or health care systems manage their treatment capacity to handle crisis surges such as epidemics. Economic models are also available for analyzing direct and indirect impacts on regional or national economies from terrorist attacks such as 9/11 or natural disasters such as hurricanes. The US Government has developed dynamic models for the nation’s critical infrastructure systems—the industrial sectors that are essential for the nation’s security and core functions, including emergency services, food production and distribution, transportation, energy, communications, and banking. Initial efforts have been made to couple pandemic models with these systems, using workforce depletion to analyze losses in infrastructure capability and reliability. A concerted effort should be undertaken to fully integrate these decision support models, if not for the current pandemic, then to prepare for future ones.
Most public officials and economic experts agree that we are in uncharted territory with regards to restarting the economy and resuming social activities without triggering further waves of COVID-19 infections. Leaders’ choices are complicated by the absence of comprehensive dynamic models to evaluate alternative strategies and identify—and refine—the most promising option. Until we plug this gap, complex decisions for responding to and recovering from pandemics will likely produce social and economic consequences that are unintended and unpalatable. My last article will review the challenging decisions about COVID-19 that we currently face.