The hierarchy of medical evidence

Pat Sullivan, Jr. discusses in some “recent”: “posts”: the roles of different types of medical evidence. First, let me get an issue out of the way concerning the definition of “anecdote,” since it seems to “get”: “in”: “the”: “way”: From Sullivan’s entry:

Eye-witness testimony, corroborated by medical records, video-tapes, etc. should NOT be given the label “anecdotal” because that label is technically incorrect. Turns out, the dictionary agrees with him…

anecdotal: Based on casual observations or indications rather than rigorous or scientific analysis

As I see it, there are three levels of individual evidence, in order from least to most credible:

* paid testimonials, including celebrity endorsements
* testimonials freely offered, including anecdotes
* carefully documented and corroborated case studies

So, now, onto the sentence that caught my eye:

bq. the anecdotal evidence of one case (like Kevin Champagne’s or Scott Shoemaker’s) is more reliable as proof when building a “factual matrix” than epidemiological studies and randomized controlled studies because these have been demonstrably corruptible. (Please note that I am NOT saying all epidemiological studies and RCTs are corrupt! Only that by their very nature, they can easily fall prey to corruption, bias, etc. and the Millers’ article demonstrated that.)

Wow. That seems very counterintuitive, that evidence collected from one person could be _more_ convincing than evidence collected from many people. So, let’s examine what can “go wrong.” What are the features of a RCT?

_Randomization_ refers to the process of assigning patients randomly to a treatment in a study. (And, as noted, below, in such a trial there are always at least two “treatments.”) The process may be as simple as a coin flip or as complex as balancing other factors (such as sex) by changing the probability on the basis of how many people are receiving each treatment.

_Controlled_ refers to the comparison of the treatment in question to a current established treatment (or placebo), called a “control.” Choice of control is up to the sponsor of the trial, but for drug development it has to be at least placebo (if ethical) if not an currently approved therapy. Comparing to placebo means that the trial factors out the beliefs of the patient or doctor in treatment effect.

In most RCTs, the treatment assignments are “blinded,” or “masked,” from the patient and maybe even the doctor. In fact, this is the preferred way of doing things because not blinding may allow more bias in determining the treatment effect. Sometimes it simply isn’t possible to do this, such as comparing surgery to a pharmaceutical.

Finally, at least in a drug development program, many people look at the data and the results. Formal quality assurance procedures have to be in place to make sure that the data is accurate, and that any data manipulations and analyses are scientifically appropriate. These people come from the pharmaceutical manufacturer, the FDA, and often an outside company called a CRO. Sometimes, companies will hire outside quality control experts for another layer of protection, and, furthermore (and this is true for all publicly funded studies as far as I know) a data safety monitoring committee reviews data during the conduct of the trial to make sure that patients are kept safe and the drug isn’t causing people any problems.

It sounds like a foolproof system, but it isn’t. It _is_ hard to commit fraud (or at least it will be very easily found out), but even this system of proof has its weak spots, and putting too much trust into the system (as any system) without respecting its limitations leads to situations like Vioxx™ and Baycol™. And just what are those limitations? They are subtle, so let’s try to look at them carefully.

_Statistics are good at measuring population tendencies and are horrible at predicting individual outcomes._ The two most used classes of theorems of statistics are the laws of large numbers and the central limit theorems. However, these speak on the convergence of sample averages to population means and the distribution of these estimates, but say nothing about results of any individual. Prediction of individual response is notoriously hard, and even harder in the life sciences than, say, in industrial statistics.

_Statistics tests between two mutually exclusive hypotheses._ These hypotheses are formed based on the best theory available. The results are only as good as the hypotheses.

_Statistics can only operate on the data that is available._ In the conduct of clinical trials, people are very careful to validate the data every step of the way, from the recording to the data entry, to any manipulation, to analysis and summarizing. Efforts are even made to standardize measurements of lab results, vital signs, and blood plasma concentrations. However, data is only as good as the instruments, which serve only to enhance the human senses. You can see the point even more clearly in osteoarthritis trials, where some of the endpoints are interpretations of radiographic images (x-rays). In these trials, an x-ray is taken , and, in the well-designed trials, the images are sent off to a trusted, trained, masked, independent interpreter who records the interpretation into a carefully tracked system. Indeed, a very good system, but not perfect and open to subjectivity.

Even further, the endpoint needs to be appropriate, but that leaves room for subjectivity. The Iressa (“AstraZeneca’s”: cancer drug recently in the news) illustrates this point (and also some ways of handling the issue). The FDA approved the compound on the basis that it reduced tumor size. However, the approval was conditional, and AZ had to submit survival data. As it turns out, the survival of patients taking Iressa was no different from those on standard of care or control. So the drug was removed from the market.

Complex conditions make this even “harder”: In the referenced article, Sullivan and his commenters question whether there are not two different (or even more) conditions we call “autism.” One might be more accurately called “mercury poisoning” (possibly including a predispostion to autism-like symptoms, or maybe not), and the other is more traditionally what we think of as autism. Who knows, there may even be more.

_Selection bias._ This isn’t true for drugs that are under development, for the FDA really tries hard to have access to drugs that are submitted for a marketing application. However, studies done outside the framework of pharmaceutical, biologic, or device development (with the exception of some post-marketing commitments and safety reporting) are subject to selection bias. It’s really hard to decide the effect of this bias, but basically the theory goes that, in general, only successful results get published. This does have an impact on approved drugs in that pharmaceutical representatives are allowed to hand out papers even though they do are not allowed to discuss so-called “off-label” uses of drugs. Less directly, publications involving a particular compound enhances the visibility and credibility of a product, its manufacturers, and the listed authors.

There are other issues that affect the credibility of scientific study. The credentials, experience, and popularity of the investigator all affect how a study is perceived. While reasonable, this does not imply that everything that a popular wizened investigator says is true, or that everything a fresh off-the-beaten-path investigator says is false. However, the standards of rigorous verifiability are applied differentially between the two.

_Statistical assumptions._ To describe this, let’s look at the “simplest” case where we are trying to determine whether a drug reduces the occurrence of a symptom in a sick population. To conduct this trial, we typically randomize symptomatic patients into two groups: one getting a placebo and one getting a drug. We then measure the number of people who are symptom-free by the end of the trial. We then take the difference of the percentages of the treatment and placebo groups. Then we can test whether this percentage is greater than 0.

To conduct this analysis, we have to assume the following:

# The people in the placebo group respond the same (_i.e._ have the same probability to become symptom-free) to placebo (to calculate placebo rate)
# The people in the treatment group respond the same to treatment (to calculate treatment rate)
# The people in placebo group would respond the same to treatment as people in the treatment group if they were given it, and vice versa (so the difference in rates makes sense)
# The observation of any one patient is independent of observations of other patients

We can try to account for factors that cause violations in any of these assumptions, but the fact is that even when we account for gender, race, height, weight, microarray results, or whatever else might influence outcome, these assumptions don’t hold water. At the end of the day we statisticians throw up our hands and say “this is the best we can do.” Our results are pretty good, for a population description. They’re pretty bad for individual prediction.

Oh, and if you’re familiar with central limit theory, let me remind you that the central limit theorems do *not* allow you to assume that everybody’s normal. You only get to assume that averages are random variables that converge in distribution to normality whose mean is the population mean and whose standard deviation is population standard deviation divided by square root of sample size.

_The patent system._ The patent system supports discovery and development of new drugs, not development and study of centuries-old techniques that are ultimately unpatentable. Where do you expect the motivation to come from to do clinical research for alternative methods? Sure, “NCCAM(National Center for Complementary and Alternative Medicine)”: exists and provides a valuable service, but how much private investment do you expect to happen? Just asking.

In short, scientific observation and analysis is not watertight, and is not as free of subjectivity and bias as some would lead you to expect. Our system of science, and more specifically pharmaceutical development, is not a bad system, but I do wish certain “people”: would own up to the limitations of the system and stop pretending their science is better than everybody elses. (In some cases it is, and in some cases it isn’t.) And stop preaching about what is good evidence.


2 Responses

  1. John, thanks for the insight on this. Without a doubt, pharmaceuticals absolutely have their hands’ tide. As I understand it, it takes hundreds of millions dollars to do enough studies for the FDA to approve a new drug. Once a new drug is approved, any competitor can easily knock off the drug and make the same claims — UNLESS the new drug is protected by a patent.

    Since *natural* substances (ie. Omega-3 fatty acid, Vitamin C, etc.) and centuries old healing techniques can NOT be patented, there is no financial incentive / legal defense for getting those approved by the FDA. There is also little reason / incentive to perform massive RCTs and the rest of the studies that always come along with new drugs.

    Therefore, Pharmas must develop and research entirely new chemicals that can be patented, and therefore protected as they spend millions (billions) to sell and market the new drug.

    If this “straw man” is correct, then Pharmas are obviously in a VERY difficult position.

    Once I realized the reality and importance of patents in the drug industry, it shed a lot of light on the situation at hand. It also made me stop thinking of drug companies as collectively “evil” but rather, slaves to a flawed (broken?) process. Of course, as you have said, there are good and bad apples in both conventional and alternative medicine.

    So again, thanks for shedding more insight on RCTs. And thanks for helping watch the quack-watchers. 😉

    -Patrick Sullivan Jr.

  2. Alternative Medicine

    Alternative Medicine

Comments are closed.

%d bloggers like this: