Science Lesson: What Is A Model (And What Is Not)

Models, in the scientific sense, are simplifications of a system — which can mean anything from a body’s biology to the statistical relationship among any set of variables — that offer ways to learn something about the real system. They are typically used when studying the real system would be too costly, or because it cannot be manipulated or perhaps even observed. In research related to tobacco, the term is used correctly in phrases like “statistical model” or “animal model.” These are models, though often very bad ones (e.g., another animal’s biology, let alone its psychology in a lab, is a misleading simplification how humans work). However, when the word “model” is highlighted, it is usually used inaccurately, to basically mean “a calculation with many arithmetic steps.” Usually the results of these faux-models are simply a dressed-up combination of speculation and error, as was the case with the recent article that claimed “e-cigarette use currently represents more population-level harm than benefit.” But they are taken seriously because of the mystique created by calling them models.

To understand the concept of a scientific model, compare drawing an airplane with making a miniature copy of it (often called a “model airplane”; the dozens of meanings of the word create quite a bit of confusion). You cannot learn anything genuinely new from the drawing; what you put in is all you can get out. Perhaps you were not conscious of the wingspan-to-length ratio when making the drawing, and it can be estimated looking at the picture. But this is a merely a simple calculation based on your inputs, nothing you could not have determined without creating the drawing. The miniature, however, creates ways to learn something new. It can be put in a wind tunnel to see what happens when a particular gust hits it on the runway (not to be confused with “runway model”).

“Scientific model” is not precisely defined, but it is useful to think in terms of two characteristics: First, a true model is capable of producing emergent results, outcomes that are not just trivial implications of some input. It can be used to discover something new, as with the airplane miniature but not the drawing.

Additionally, a model can be tested, just like any apparatus used in an scientific analysis. If a thermometer tells you it the room temperature is 54 degrees Celsius, it is time to get a new thermometer. If your miniature airplane insisted on flipping upside down in the wind tunnel, you can be sure that you created a bad model of the real airplane and should doubt any observations based on it. Similarly, assessments using the typical animal model of carcinogenicity — applied to vapor or anything else — are basically worthless because two core aspects of the model have been demonstrated to be faulty: Rodents are not very good predictors of human carcinogenicity and it is incorrect to make linear extrapolations from large doses to realistic ones.

With this context, consider “models” of future smoking rates, the source for such claims as “a billion people will die from smoking in the 21st century.” The word “model” is misleadingly used to refer to a mere series of calculations. Someone assumes that X percent of each new cohort will take up smoking, and Y percent of smokers quit every year, and so on. They then have a computer run through the resulting arithmetic for a hundred years. If this were actually a scientific model then grade-school word problems — “two trains leave their stations, moving toward each other at speeds of…” — would be scientific models. There is neither any possibility for emergent results nor for testing. It is just the calculation of a “what if” scenario.

It does, however, turn out that there is a model at the core of those “models.” And it is badly wrong.

The real model consists of the something like the following: “assume tobacco use behavior patterns from the 1990s are a valid model of behavior for the next century.” This is the actual modeling step, playing the same role as “assume this miniature behaves the same way in a wind tunnel as the real airplane.” The rest is just arithmetic. It turns out that this model of the future is like the miniature airplane that flies upside down. The period used as the model of the future saw only modest trends in behavior (e.g., a slightly smaller portion of the population started smoking each year). But a mere 17 years into the modeled century, we observe huge deviations from the model. In particular, there has been substantial replacement of smoking over a very short period, contrary to the modeling assumptions. Snus substantially replaced smoking in Norway and vaping did in the UK, possibly Iceland, and among American teenagers.

The actual model part of these “models” is testable, and it has failed. Moreover, we did not have to wait in order to realize that it was a bad model. If the same model were applied to almost any earlier time period — taking one decade’s patterns and assuming they represent the behaviors for the subsequent century — it fails just as badly. Indeed, you would have to go back to the 19th century before this model was even valid for 30 years. This is quite similar to the previously analyzed model employed by Stanton Glantz, in which he assumed that smoking rates among American teenagers decline linearly over time. A look at the historical data made clear that this simplification (i.e., modeling choice) was invalid. The same is true for the big “how many people will smoke in the future” models, it is just better hidden by all the bells and whistles.

(In addition, the “future deaths from smoking” models implicitly assume no changes from the 1990s in medical technology and other important factors. These weaken the models still further. It is simply impossible to estimate smoking rates in 20 years, let alone deaths over a century.)

Of course, someone can always go back to the calculations and say, “here is what will happen if 20 percent of smokers in a country switch to vaping over five years — see, it is flexible and can take that into consideration!” But this is really a concession that the actual model was wrong and the reported results are meaningless. Moreover, the flexibility further illustrates that the faux “model” is just a complicated “what if” calculation. There is nothing inherently wrong with “what if” calculations. Good policy analysis is full of such calculations. But such results should be properly reported as mechanical exercises in assessing the implications of hypotheticals, not as actual scientific predictions that result from defensible underlying models.

All of this might seem rather esoteric until a new “model” result is all over the news, as with the recent claim that vaping creates a net population health cost. The absurd paper is the latest by anti-vaping activist Samir S. Soneji of Dartmouth (think: Glantz, but without the conman social skills). Soneji and colleagues claim to have created a model that leads to their results, but they really did not. Instead they just used a ridiculously complicated method to calculate the simple implications of a set of assumptions, and inaccurately called that a model. The key assumption was that for a huge portion of teenagers who try vaping, vaping will cause them to start smoking, even though there is no evidence that vaping causes anyone to start smoking. Starting with that “what if” assumption, you could calculate the implications on a bar napkin rather than using the fancy tools they employed. Basically someone on the research team learned a fun new software tool and it was used to create the illusion of an actual scientific analysis.

The common criticism of this “model” has been the observation “garbage in, garbage out.” That is a valid issue with any actual model; if the data fed into an statistical model is full of measurement errors, for example, then any results will be wrong even if the model is good. But this gives Soneji far too much credit. There is no “in…out.” There is just the trivial implication of one assumption, hidden behind some pointless complication that serves only to distract the marks from noticing the garbage is all there is. (Lacking Glantz’s natural interpersonal conman skills, Soneji is apparently trying a flashier con.) There is no model, not even at the level of “assume the future will all look like the past.”

These calculation exercises should perhaps be called “Potemkin models,” after the story of fake Russian villages that became a metaphor for creating a facade with nothing behind it. Behind Soneji’s Potemkin model lies literally nothing — no model, just a single ridiculous assumption and what follows immediately from it. Behind the Potemkin models of future smoking is an actual model, but it is a complete failure. The seemingly impressive calculations are just an exercise that any high-school math whiz could throw together using free software tools. Such a tool can be useful, just as a pen and bar napkin can be, but it does not constitute scientific analysis. In tobacco control research these tools are just used to distract from the weakness of the actual models.

Follow Dr. Phillips on Twitter