Statistician: Daily Kos has every right to be furious

Markos Moulitsas has every right to be furious. Undoubtedly he paid a large amount of money to Research 2000 to conduct weekly political opinion polls and the evidence is overwhelming he did not get what he paid for. There is no doubt that the reported results cannot be the outcome of simple random sampling. What is less clear is that the results are actually fraudulent.

The research of Mark Grebner, Michael Weissman, and Jonathan Weissman is an excellent example of data detective work. While the technical details are complex, the basic premise is that while random samples are subject to chance variation, when executed properly, such samples are nevertheless governed by well understood and inviolable mathematical rules. Research 2000’s weekly poll results display at least three characteristics that violate these rules.

First, the week-to-week chance variation is too small: it is much less than what would be expected under standard assumptions about random samples. Second, the sample percentages in the surveys change almost every week, which is very unusual, odd as it may seem to non-statisticians; sample surveys should fairly frequently produce the same result even when there is considerable margin-of-error. Finally, there appears to be a strong relationship between the outcomes of unrelated results.

Are the samples therefore fraudulent? Has Research 2000 simply made up the numbers? We may never really know. We only really know what is not: the sample percentages provided by Research 2000 and reported on the Daily Kos were not the results of simple random samples. But statistics can only be used to contradict; it cannot confirm. So, while we know for sure that something is wrong with the data, we cannot know for sure what that something is. One alternative is fraud. A second is stupidity. Perhaps both are true.

One way that the statistical detective can observe fraud is if data matches common but incorrect intuition about chance behavior. To see this, consider what would happen if you were to toss a fair coin 100 times. Most people believe that it is more likely for the coin to land heads 50 times than not. This is in fact false. While 50 heads is indeed the most likely outcome, it is 10 times more likely to be something else.

They who falsify the results of chance experiment often produce data that matches their own intuitions and not reality. The most famous example of this is the great work of Mendel whose experiments helped to establish the theory of genetic heritability. The statistician R.A. Fisher observed that Mendel’s observed second generation ratio of the dominant to recessive phenotype was implausibly close to 3-to-1. From this we know that Mendel almost surely massaged his data, in some way, to confirm his hypothesis. Of course, Mendel was right about the genetics, so we forgive his indiscretion.

Now we cannot be sure that Mendel knowingly engaged in fraud. Nor can we know what actually happened with Research 2000, although it will surely help to see the raw data. Grebner, Weissman and Weissman have alleged that Research 2000 has telltale characteristics of non-random number generation. I am not so sure. If Research 2000 were indeed making up their data, one would think they would do a better job of covering their tracks.

For example, it is very odd that if the sample percentage in any given week is odd/even for the women it is almost always odd/even for the men. This can’t happen in a random sample. But there is no reason for this to appear in fraudulent data either. Plausibly, if not more likely, this unlikely “parity” is due to a programming glitch that is mishandling the rounding of fractions in some way. For example, suppose that 50 men and 50 women were asked if they approved of the President’s performance. No matter what the results of the survey, the observed percentages must be even for both men and women (since samples of size 50 must produce percentages in multiples of 2%). Now, suppose that 51 men and 51 women were sampled and suppose further that an understandable programming error was introduced whereby percentages were always rounded down to the nearest whole number. Since the sample percentage would always be slightly less than a multiple of 2%, when rounded down, the reported sample percentages for both the men and the women would always be odd.

So what do we really know? Something is surely wrong with the reported data. Are the datasets fraudulent or badly processed? Without actually examining the data, we cannot be sure. We may never know. In my mind, I think there is undue emphasis placed on polling. A truly high quality research poll is very expensive and time-consuming. The product that Research 2000 agreed to provide, and what Markos Moulitsas alleges was not provided, in my mind, was flawed in concept as well as in execution.

Abraham Wyner is a professor of statistics at the Wharton School of the University of Pennsylvania. His principle research focus at Wharton has been in applied probability, information theory, statistical learning and baseball. He has published more than 30 articles in leading journals in many different fields.