Lying with Statistics Part III: the one-sided vs. two-sided test

This brand of lying with statistics comes from either misunderstanding how to interpret statistics or just simple dishonesty. It looks legitimate, but underneath it’s rotten to the core.

This issue of one-sided vs. two-sided tests is a little hard to explain without a little statistical background. In the context of simple drug trials, one-sided test decides between two different hypotheses: your drug is no better than placebo vs. your drug is better than placebo. The two-sided test decides between these two: your drug is equivalent to placebo and your drug is not equivalent to placebo.

Oddly enough, the two-sided test is harder, because you are testing whether your drug is worse than or better than placebo. The one-sided test in many respects is easier, because if you expect your drug to have some advantage, it has a lower threshold to meet to be called “better.”

So here’s how the simplest case goes. You take the average outcome of the drug group and subtract the average outcome of the placebo group. If it’s large enough, your drug “wins,” i.e. can be said to be better than placebo. For a one-sided test, it usually has to be as large as 1.645*standard error (SE) (a standard error is a measure of how precise your average estimate is — the larger it is the less precise your average is). For a two-sided test, it usually has to be as large as 1.96*SE (or smaller than -1.96*SE, but then your drug is worse). These strange numbers are designed so that only 5% of the time, if your drug has the same effect as placebo, the trial will falsely let your drug “win.”

The FDA usually requires two-sided tests, i.e. you have to meet the 1.96*SE threshold to “win.” These are for drugs that have to be marketed. (The FDA acknowledges this is conservative, and, in fact, this policy has the acknowledged side effect of a 2.5% chance of a false “win” for the drug.) However, without such rules it is possible in other circumstances that people will use one-sided tests. That’s ok, as long as it’s laid out in advance.

However, unscrupulous researchers may prespecify a two-sided test, but then find out after collecting data that they won’t make the threshold of a two-sided test, but will on a one-sided test. Then they report the one-sided test.

Aside from ethical problems, what statistical problems are there with this strategy?

The main one is that all hypothesis tests have a decision rule. The decision rule is an integral part of the statistical calculations. For a one sided test, the decision rule is that if difference of average of drug group and placebo outcomes exceeds 1.645*SE, then declare a win. For a two sided test, the decision rule is that if a difference of average either is greater than 1.96*SE or more negative than -1.96*SE, then declare a win.

The decision rule that the above unscrupulous researcher uses is if the difference is more negative than -1.96*SE, or if the difference is greater than 1.645*SE, then declare a win. If you do the statistical calculation, you’ll find a 7.5% false positive rate rather than a 5%.

Moral of the story: for all statistical analyses: say what you are going to do and then do what you say, or all your hard work at controlling your error rate goes out the window. For reported results in journal articles or elsewhere, make sure they said what they are going to do and then did what they said.

%d bloggers like this: