When statistics can’t tell the truth, a followup application to the Vioxx controversy

I’ve avoided posting on the Vioxx controversy for a long time, but I would be amiss if I discussed drug safety without discussing the hot button issue of the day that has brought drug safety to the forefront.

My earlier thesis is

Clearly, closely adhering to the rules of statistics isn’t going to get anyone very far in drug safety analysis until we develop new methodology.

The Vioxx controversy (accusations of the “head in the sand” approach aside) highlights this issue.


These graphs appeared in a recent New York Times<footnote>Yeah, Friday, I’m ripping you off. So, to assuage my heavy burden of guilt, I’ll tell people to visit regularly. Not that you need any traffic from this humble little blog.</footnote>.They are modified Kaplan-Meier graphs, which are designed to show the risk of an event (here, a heart attack) over time and to compare the difference in risk between two groups. Notice that the placebo group almost categorically has a smaller risk over time than the Vioxx group. However, note also the error bars, which indicate 95% confidence intervals in risk at a single point in time. The error bars do not separate until 36 months in either graph.

However, this is not a conservative statement. That we don’t have the resolution to discern differences in safety doesn’t mean we can conclude they are there. In fact, in this case, we might be better served to conclude from the first graph that the risk gets significantly higher at 18 months in the first graph, and at three or four months (with a huge jump at 18 months) in the second. This may not be borne out by the strict statistical evidence as shown in the graphs, but, if we are to err on the side of caution, we can’t go out to where the error bars separate.

There is even more to this story from the statistical perspective. These two graphs are from two different studies, and two grossly different conclusions were drawn (i.e. from the first, that the risk from Vioxx increases after 18 months, and from the second, that the risk from Vioxx increases after 4 months). Repeatability is an important principle of science, but as it turns out studies are notoriously hard to repeat. This is why the FDA usually requires two separate confirmatory trials for a marketing application to be approved. In this case, we had two relatively large studies with wildly different conclusions about an important safety characteristic of a major blockbuster drug. Clearly, there’s more to statistical analysis than getting a p-value.

A serious problem here is where the rubber meets the road in the courtroom. We biostatisticians understand the problems (and the few that will listen to us when we explain multiple comparisons) with trying to prove a drug safe, and anyone spending any amount of time analyzing pharmaceuticals can tell you that no drug is safe. And yet our standards understandably are high. When the FDA reviews a drug, they look at risk and benefit. The public, and indeed juries and judges are trying to do the work of FDA statisticians, medical reviewers, toxicologists, regulatory and legal experts, and have to weigh a considerable amount of information that’s really hard to digest (or, if the plaintiff’s lawyer’s are good, just weigh the ‘unsafe and they knew it’ info higher than the defense’s).

Technorati Tags: , , ,

%d bloggers like this: