Daily Vaper

Science Lesson: Model Shopping – The Real Problem With Epidemiology

Photo via Shutterstock

Carl V. Phillips Contributor
Font Size:

Before data turns into study results, it must be run through a statistical model chosen be the researchers. While it is possible to keep this simple, that is still a choice; the data never speaks for itself. The choices matter, and so “shopping around” among models creates opportunities for typical minor fudging and occasional out-and-out lying. Understanding this is useful for anyone who wants to understand the research about vaping — or about nutrition, environmental pollutants or most any other health-related research in the news.

For a few types of studies the best statistical model is fairly obvious. For a randomized experiment, researchers just count up the outcome events in each group. Other bells-and-whistles can be reported also, but if that simple statistic is not reported, it becomes fairly obvious that the researchers are attempting to mislead. But for most epidemiology, and social science more generally, there is no simple obvious choice, and so model choices will typically create misleading results without it being obvious.

For example, consider incorporating age in a epidemiological model, which is usually necessary because age has a large effect on most outcomes. The research might put subject age into the statistical calculation as a continuous variable, forcing the model to make particular “assumptions:” for example, that the effect of being 40 rather than 30 is the same as the effect of being 50 rather than 40. Or the researchers can divide subjects into two or more age groups (e.g. 18-25, 26-40, etc.), and then either include a variable for each, or analyze each group separately (called “stratifying”).

The exposure itself can be measured in different ways (e.g., ever tried vaping, vaped at least once recently, vapes every day), as can the outcome. All available data can be analyzed or just some of it. For most variables the researchers also have the option of just leaving it out of the model. There can be hundreds of choices to make, many of which offer literally infinite options. So which one do researchers choose?

Ideally researchers would decide on the right model based on theory and previous results, without looking at their data first, and then run it. While this not completely practical, it is possible to do a reasonable imitation of this. In practice, however, researchers usually run many plausible statistical models and cherrypick one that produces the “best” results for their data. They then report as if it were the only model they considered, often backfilling a story for why it seemed right from the start. Since the model that produces the strongest associations in the data is usually considered “best,” this means research consistently overstates the associations. It is akin to choosing a photograph from a few snapshots because it makes you look most attractive. This has obvious advantages, but presenting a valid estimate of your average attractiveness is not one of them.

Sometimes the problem is not a mere creep toward overstating relationships. For junk science researchers like tobacco controllers, the common practice is to search through every possible model option, even those that are not plausible candidates for the best model, to find the one that produces the result that best supports their political goals.

This problem has become much talked-about recently, mostly in psychology and other social sciences (the epidemiology field is still largely silent about the problem). It goes by many names, but the most explanatory is “unreported multiple hypothesis testing.” When I first started studying the problem, when few people were talking about it, I labeled it “publication bias in situ”. Publication bias is often thought of as the “file drawer effect,” in which studies with boring or seemingly wrong results are filed away rather than appearing in journals. This causes the literature as a whole to be biased toward “interesting” and “proper” results. But far more publication bias takes place when a biased choice among many different models is reported. This bias exists in situ — within each study report itself — rather than only at the level of the literature as a whole.

The discussion of this problem often focuses on fiddling with model choices to make a result become “statistically significant.” But this is the wrong way to look at it. Epidemiology is a science of measurement, not a way to discover if a phenomenon merely exists. Of course it is misleading to claim an exposure causes a statistically significant doubling of a disease risk when it really has no effect, and this may lead to bad choices. But it is equally misleading and harmful to claim it quadruples the risk when it really only doubles it. It is a reasonable rule-of-thumb to assume that if “the literature” shows that risk caused by an exposure is X, the real value is much closer to half of X.

This is not just a problem with blatantly dishonest researchers. Most every researcher I have ever observed in action, including those who are trying to do honest research, do this. They rationalize it based on needing to “listen to the data” or not knowing what model to use until they see which “best fits the data.” But what this means is that they are usually picking a model that makes the relationships in their data seem as strong as possible.

When blatantly dishonest researchers intentionally take advantage of this, the problem becomes much worse. I recently wrote in detail about a paper by Stanton Glantz from a year ago, in which he claimed that smoking rates among U.S. minors did not decrease as their vaping increased. The authors picked a dataset and made the absurd assumption that smoking prevalence would have followed a particular linear decline after the introduction of vaping, even though historical data clearly showed no such trend. Glantz and others tobacco controllers play lots of games with the truth, but this is probably their worst crime against science: starting with a conclusion they want to support and then hunting for the data and a concocted model that can be used to best support it.

They get away with it because readers never get to see results from unreported models, and can seldom get the data to run other models themselves, so they do not have affirmative evidence that the single reported model reported produced biased results. This is also true for journal reviewers, who see only what the readers see. (When I review a paper for a journal I always insist that the authors report — at least for purposes of dong the review — the results of the other models they ran. It never happens.) Even when something about the reported model seems quite odd, there is little a reader can do beyond noting the oddity.

Sadly, most critics of the Glantz analysis quibbled about details rather than pointing out that the model was an absurd concoction from the start. Because picking favorable models is normal behavior, and often impossible to prove, people in public health and related sciences seldom even notice when the model is absurd even when they are criticizing a result. Moreover, researchers know that admitting this dirty secret of the field — that model choices are biased — could be a threat to their own credibility.

Follow Dr. Phillips on Twitter

Tags : vaping
Carl V. Phillips

PREMIUM ARTICLE: Subscribe To Keep Reading

Sign up

By subscribing you agree to our Terms of Use

You're signed up!

Sign up

By subscribing you agree to our Terms of Use

You're signed up!
Sign up

By subscribing you agree to our Terms of Use

You're signed up!

Sign up

By subscribing you agree to our Terms of Use

You're signed up!
Sign up

By subscribing you agree to our Terms of Use

You're signed up!

Sign Up

By subscribing you agree to our Terms of Use

You're signed up!
Sign up

By subscribing you agree to our Terms of Use

You're signed up!
Sign up

By subscribing you agree to our Terms of Use

You're signed up!
BENEFITS READERS PASS PATRIOTS FOUNDERS
Daily and Breaking Newsletters
Daily Caller Shows
Ad Free Experience
Exclusive Articles
Custom Newsletters
Editor Daily Rundown
Behind The Scenes Coverage
Award Winning Documentaries
Patriot War Room
Patriot Live Chat
Exclusive Events
Gold Membership Card
Tucker Mug

What does Founders Club include?

Tucker Mug and Membership Card
Founders

Readers,

Instead of sucking up to the political and corporate powers that dominate America, The Daily Caller is fighting for you — our readers. We humbly ask you to consider joining us in this fight.

Now that millions of readers are rejecting the increasingly biased and even corrupt corporate media and joining us daily, there are powerful forces lined up to stop us: the old guard of the news media hopes to marginalize us; the big corporate ad agencies want to deprive us of revenue and put us out of business; senators threaten to have our reporters arrested for asking simple questions; the big tech platforms want to limit our ability to communicate with you; and the political party establishments feel threatened by our independence.

We don't complain -- we can't stand complainers -- but we do call it how we see it. We have a fight on our hands, and it's intense. We need your help to smash through the big tech, big media and big government blockade.

We're the insurgent outsiders for a reason: our deep-dive investigations hold the powerful to account. Our original videos undermine their narratives on a daily basis. Even our insistence on having fun infuriates them -- because we won’t bend the knee to political correctness.

One reason we stand apart is because we are not afraid to say we love America. We love her with every fiber of our being, and we think she's worth saving from today’s craziness.

Help us save her.

A second reason we stand out is the sheer number of honest responsible reporters we have helped train. We have trained so many solid reporters that they now hold prominent positions at publications across the political spectrum. Hear a rare reasonable voice at a place like CNN? There’s a good chance they were trained at Daily Caller. Same goes for the numerous Daily Caller alumni dominating the news coverage at outlets such as Fox News, Newsmax, Daily Wire and many others.

Simply put, America needs solid reporters fighting to tell the truth or we will never have honest elections or a fair system. We are working tirelessly to make that happen and we are making a difference.

Since 2010, The Daily Caller has grown immensely. We're in the halls of Congress. We're in the Oval Office. And we're in up to 20 million homes every single month. That's 20 million Americans like you who are impossible to ignore.

We can overcome the forces lined up against all of us. This is an important mission but we can’t do it unless you — the everyday Americans forgotten by the establishment — have our back.

Please consider becoming a Daily Caller Patriot today, and help us keep doing work that holds politicians, corporations and other leaders accountable. Help us thumb our noses at political correctness. Help us train a new generation of news reporters who will actually tell the truth. And help us remind Americans everywhere that there are millions of us who remain clear-eyed about our country's greatness.

In return for membership, Daily Caller Patriots will be able to read The Daily Caller without any of the ads that we have long used to support our mission. We know the ads drive you crazy. They drive us crazy too. But we need revenue to keep the fight going. If you join us, we will cut out the ads for you and put every Lincoln-headed cent we earn into amplifying our voice, training even more solid reporters, and giving you the ad-free experience and lightning fast website you deserve.

Patriots will also be eligible for Patriots Only content, newsletters, chats and live events with our reporters and editors. It's simple: welcome us into your lives, and we'll welcome you into ours.

We can save America together.

Become a Daily Caller Patriot today.

Signature

Neil Patel