Another Paper Uses Distracting Complexity To Suggest Vaping Doesn’t Work

A recent paper in BMJ Open claims to show that smokers’ use of vapor products to partially replace smoking does not result in lower per-day cigarette consumption. This conclusion is a ridiculous stretch based on bad study methods, but it is being adopted as yet another point of attack by anti-vaping activists whose second-favorite message (after “think of the children”) is to claim that vaping just keeps people smoking. The paper, “Is prevalence of e-cigarette and nicotine replacement therapy use among smokers associated with average cigarette consumption in England? A time-series analysis,” by Emma Beard, Jamie Brown, Susan Michie, and Robert West looks at population averages using extremely complex methods that distract from the fact that this is simply the wrong analysis. This is the same research team that used junk methods and obviously inaccurate assumptions to produce the (thoroughly debunked) claim that a mere 16,000 smokers quit because of vaping in England in 2014, despite more than ten times that many switching from smoking to vaping.

Proper observational studies should be conceptualized in terms of the counterfactual experiment they are trying to proxy for, and try to match it as closely as possible. If it is simply not possible to do that with a particular dataset, as in this case, then it really should just not be tried. The counterfactual of interest here is “for the average smoker who also vapes, how much less does she smoke compared to a world in which vaping were not an option?” The perfect god-mode experiment would be to rerun a version of the world without vapor products and see how much more each such person smokes in that counterfactual world. The closest analog of the impossible experiment consists of observing how much the same people smoked before and after adding vaping to their mix (theoretically following smokers over time and watching some of them take up vaping would be ideal, but asking retrospectively is more practical and works just fine). It is not perfect because we cannot be sure their before-after smoking difference was not affected by other changes over time, but it is probably the best we can do.

An approach that is more common in epidemiology — because for most questions we do not have such a clear before-after comparison — would be to look at smoking quantity in a group who also vapes and compare it to exclusive smokers. In this case, the non-vaping comparison group is standing in for the counterfactual version of the vapers, in which they exist in a world without vaping. But this introduces big confounding problems that do not exist when someone is being used as her own comparison group. For example, it might be that smokers who added vaping were heavier smokers in the first place, so even though they cut down, they still smoked as much as the non-vapers. There are ways to try to “control for” this, but they are imperfect, so it is usually better to do the before-after comparison if possible.

There is also the complication of smokers who vape for while and then quit smoking entirely. While they are doing both, they are presumably smoking less over time, pulling down the average. Then when they fully quit smoking, they exit the subpopulation that is being studied, and the average of those who remain ticks back up. This shows that this is a bit of an awkward question to ask — “how much does vaping cause smokers to cut down, ignoring anyone after they cut down by 100 percent?” In the before-after version of the study, it is possible to deal with this sensibly, by asking something precise like “among smokers who started vaping but were not smoking abstinent three months later, how much less did they smoke.” In the comparison-group version of the study, this complication becomes much worse and is a huge contributor to the confounding (e.g., smokers who like vaping a lot, and thus are the best candidates to cut down a lot with it, are most likely to have already exited the smoker population). The complication is far worse still when doing a comparison of population averages over time, as the West group did.

One comparison no one would think was a good proxy for the counterfactual comparison of interest is population averages for both variables, compared across times. This is what the West group did, looking at the association between the prevalence of people who both smoke and vape (at a given time) and the average number of cigarettes smoked per smoker. It seems doubtful the authors even stopped to consider what counterfactual they were trying to proxy. This is not a comparison of each individual’s smoking intensity compared to her counterfactual self in a world without vaping, nor to her recent-past self, before she started vaping. It is not even a comparison of vaping smokers to the non-vaping smokers at a particular point in time. Most charitably, the authors could have been thinking that they were proxying for a counterfactual comparison of two versions of a population (“if this population in which 5 percent of smokers vaped instead had 10 percent of them vaping, how much lower would average cigarette consumption be?”), which can be seen as (sort of) a worse version of the comparison-group study.

But they were not even really doing that. In terms of the important variable, vaping, the English population changed radically over the last decade, so it was not just different version of the same population. Recent adopters of vaping are not just a larger collection of the same people who were early adopters. The few early adopters, c.2010, were undoubtedly more dedicated to cutting down on smoking, trying something that was a weird new technology, and so may have cut down more on average. More recent adopters, a much larger portion of all smokers, found it much easier to start vaping even if they merely wanted to cut out one cigarette per day so they did not have to step outside the pub, or even none. So as the number of vapers increased, their average dedication probably decreased. And once again, this excludes everyone who switched completely, meaning that those earlier adopters who started vaping to cut down and liked it enough to keep cutting down eventually no longer counted, whereas the larger number of recent adopters are not far down that path. Meanwhile, the average number of cigarettes smoked is changing over time for other reasons, something the authors tried to control for but this is always imperfect. These complications and others make it impossible to make much sense of the comparison across time. It is neither surprising nor meaningful that when the researchers compared the averages across time, they found very little association.

How they carried out that comparison also represents a problem, though minor compared to it simply being the wrong comparison in the first place. The paper reads like a classroom exercise where the authors recently learned a new statistical method — for readers who might be interested, it was an econometrics-style time-series ARIMAX model (autoregressive, integrated, moving average, with exogenous variables) — and were trying it out as an exercise, without regard to whether it was really useful. It is complicated, but basically the authors ran those population averages through various assumptions about how past estimates of the value affects future values and how changes in the values from one period to the next are affected by outside variables.

A reader of this paper who is familiar with econometrics risks having his head explode. Econometrics, equivalent to epidemiology but applied to economics data rather than health data, almost always uses far more complicated statistical methods than epidemiology. Like epidemiology, econometrics is notorious for use of biased model design to tease out “good” results from the data. In econometrics, the methods for doing this are more subtle and more varied because the complicated models create a lot more little options to tweak the results without doing anything apparently out of the ordinary. However, at least in an econometrics paper, unlike epidemiology, the exact model equation used and other details of the model are always reported. The reader never knows what other models were tried and discarded because the results were not as “good,” as with epidemiology, but at least he knows what they did. Not in this case. The model is not reported and most of the rest of the methods description just leaves the reader wondering what it really means.

Another clue that this is a bad model choice is that it showed basically no effect from anything (some policy changes and NRT use were also modeled). When you have a straightforward simple model, like the before-after comparison, you can be pretty confident it basically does what you want. But for a weird and complicated model like the West team used, you need to show it works — somehow, for something — before reaching any conclusions. Consider the comparison of testing an electrical outlet: If you have a lamp you use regularly, you can just plug it in and be pretty confident it is giving you the right answer. But if you drag an old piece of computer equipment out of the basement, plug it in, and get nothing, it might be there is no power to the outlet, but it is quite likely that instead the untested device is not functional.

This paper represents a terrible failure of health journal peer-review of a non-typical sort. It seems safe to assume that the reviewers for the medical journal that published this could make no sense of the methodology (someone familiar with econometric time-series modeling can only make enough sense of it to figure out that not enough is reported to make sense of it). So they just ignored the methods, assuming that they were fine (“I can’t understand what they did, so it must be something impressive”). Approximately 0 percent of readers will be able to make sense of what was done. But awash in the baffling inadequate reporting of complicated methods, which probably do not really matter much at all, reviewers and readers are distracted from the fact that this is simply a terrible comparison for assessing the counterfactual of interest. There are simple ways to get a much better measure of how much introducing vaping causes smokers to cut down. Indeed, the most solid and simple method (before-after comparisons) also allows the outcome of complete switching to also be observed and factored into what is reported, rather than making complete success the equivalent of not even trying in terms of the results.

This is a clear case of someone having one hammer in their toolbox (the Smoking Toolkit survey population averages) and trying to apply it to every imaginable question. The reader should not be distracted by the fact that the authors acquired an ARIMAX Accessories Kit to dress up their hammer — despite the glitter, stickers and googly eyes, it was still a hammer and it was a useless tool for the job.

Follow Dr. Phillips on Twitter