Science Lesson: Epidemiology Cannot Answer Yes-No Questions

Epidemiology – the quantitative analysis of diseases and their causes at the population level – can often be found in today’s news. A typical headline reads, “E is found to cause D” where E is some exposure and D is a disease. This language usually comes directly from the titles of epidemiology papers, so one would be forgiven for thinking that the science is primarily about answering yes-no questions: “Does E cause D? Yes or no?”

Epidemiology is not about discovering the mere existence relationships or other phenomena. Like most sciences, it is about assessing magnitude. The same is true for social sciences at the edge of what is technically epidemiology, like studies of what causes people to switch from smoking to vaping. These sciences are not equipped to answer “no” to any question: it would be impossible to gather data that shows that, say, drinking milk does not (ever, for anyone, under any circumstances) cause AIDS. As such, epidemiology and epidemiological conclusions must be used carefully.

Consider the example of the “gateway effect” that was discussed in a previous science lesson. The typical question – “does vaping cause teenagers to take up smoking?” – is actually nonsense. Taken literally, it is asking are there at least one or two teenagers who started smoking because of vaping, or will someday. It should be obvious that no conceivable body of evidence could ever show that the answer is “no.” Similarly, learning that that the answer is “yes” would be completely useless information. No study to date, if the results are interpreted honestly, has offered such evidence. Still, it is still almost certainly true, at least at the “one or two” level. But before any legitimate decisions could be made based on this, we would need to know how strong the effect was.

Science reporters, and even most health researchers, write as if all science looks like the discovery of the Higgs boson or a cellular mechanism, which are some of science’s relatively rare yes-or-no questions. In reality, most scientific research is concerned with quantitative questions: For example, “how much better will crops grow with a particular fertilizer?” or “how many people lived in North America 10,000 years ago?”

Not only can epidemiology never provide evidence that E does not cause D (ever, at all), but for most any question we might ask, the answer is always “yes, E does cause D.” Does smoking cause lung cancer? Yes. Does smoking ever prevent lung cancer? Almost certainly yes. That is, there is almost certainly someone among all past and current smokers who would have gotten lung cancer at a particular age had he not smoked, but lived to that age without lung cancer because he smoked.

Epidemiology almost always measures the net increase in risk, which subtracts the protective effect from the measured increase. For the case of smoking and lung cancer, any trivial protective effect makes little difference. When trying to assess a gateway effect it means that it is nearly impossible to detect whether there are any gateway cases, because the protective effect is almost certainly greater. Fortunately, epidemiology is a practical science of measurement, not a theoretical science of discovery. We only care about the net magnitude.

If it were discovered that vaping causes cancer, would that be a good reason to stop vaping? It obviously depends on whether the net increase in risk is a small fraction of 1 percent, or if it is 20 percent. Or perhaps vaping sometimes causes cancer, but it also prevent diseases more often than that, providing a net benefit. Moreover, if the alternative were smoking, we should compare those numbers to the increases in risks from smoking. Most everyone understands this, but are tricked into forgetting it when they read the headline, “Study shows that E causes cancer!”

It is not just a problem with headline writers. In my role as a journal editor I recently reviewed a good paper by skilled honest researchers. They were cleverly demonstrating a problem with the way epidemiology is done, that the choice among different valid measurements of a particular variable can produce different results (this will be the topic of the next science lesson, but this one is needed to lay the groundwork). However, they reported the contrasting results in terms of whether an effect was reliably detected for one measure and not another, rather than the magnitudes. Even if two different methods both detect an effect, it matters that they provide substantially different estimates of its size. That could be the difference between whether we should care about it or not.

Tobacco controllers cynically take advantage of this common misinterpretation of epidemiology results. They condemn vaping as not “harmless,” though it is inevitable that any exposure harms at least one person. They say “we cannot be sure that vaping causes no risk of D,” though it is impossible to ever be sure of that (and, indeed, it is almost never true). They claim a particular causal relationship exists (e.g., that product use causes a disease or a gateway), as if that mere fact is interesting, without offering the quantification that actually matters.

Epidemiology results are reported as if they were the discovery of a new particle, but they are really more like measuring the speed of light. When reading a headline, imagine that it was about the latter. It should be immediately apparent that the claim, “new study shows that light has speed,” is utterly useless information.

Follow Dr. Phillips on Twitter