Junk Charts takes on “March mildness”

No, not the weather! We’re talking about bracketology. Apparently this was a year running low on upsets, and the NYT(New York Times) wanted to make a point. By improving on the NYT(New York Times) chart, Junk Charts makes some interesting discoveries: 10 beat 7 upsets occurred almost as frequently in recent history as 9-8 upsets. In fact, in 1999, all 9 and 10 seeds won!

11-6 and 12-5 upsets are not that uncommon (bye bye Duke!). 13-4 is relatively uncommon, and 14-3 upsets are rare (unfortunately including a nasty splotch on UNC’s record). 15-2 upsets have occurred once, and 16-1 upsets have never occurred as you hear over and over every march.

Technorati Tags: ,

Hey, Swivel’s up!

Share your data analysis all you want. Swivel‘s up!

Technorati Tags: ,

Law of the iterated logarithm, and a reminder on the limitations of statistics

We trust confidence intervals, and they are reported with just about every study in existence. However, they are often wrong, not just because they are wrong 5% of the time but also because they are often incorrectly computed.

In doing applied statistics day after day, it’s difficult to remember all of the theoretical statistics I learned in graduate school. (I got my degree at UNC-Chapel Hill, which is known for its theoretical statistics program.) So I brought home my copy of Van der Vaart’s “??Asymptotic Statistics??”:http://www.amazon.com/Asymptotic-Statistics-Statistical-Probabilistic-Mathematics/dp/0521784506/sr=1-1/qid=1164382703/ref=pd_bbs_sr_1/002-3849726-6409614?ie=UTF8&s=books. It’s a very nice combination of applied and theoretical statistics, with lots of examples and reminders about why the statistical methods we use in the applied world work the way they do. With this study came a stark reminder about some of the dangers in overinterpreting the results of statistical methods.

There’s this very odd looking result called the law of the iterated logarithm: lim supn→∞ (Y1 + … + Yn)/√(n log log n) = √2 for a sequence of random variables Y1, … with mean 0 and variance 1.

So what does this mean. lim sup means maximum “in the limit,” i.e. what is the maximum value of that strange-looking expression if you ignore the first 10, 100, or even 1,000,000 sums. Now, what about that that strange looking expression? It’s the average of Y1, …, Yn, multiplied by √(n/log log n). If you’ve happened to have gone through a few statistics classes and an advanced calculus class or two, you might recognize that the average goes to 0 (because that’s the mean of all the Yn) so “in the limit” that strange-looking expression is an indeterminate 0×∞. It converges to 0, but infinitely often it’s close to √2.

This is a result that applied statistics don’t use very often, but it still has important implications. I won’t go into the full argument, but an example illustrates how this means that confidence intervals are wrong.

Let’s say that we are measuring the average height in a population (and let’s say there are an infinite number of people). We measure 100 people and construct a 95% confidence interval AVE-2/√n – AVE+2/√n (I use 2 to make this simple, though statisticians might rather see 1.965.) And then say we do a sequential testing scheme: we add another person into the study and redo the average and confidence interval. Then add another person and redo the average and confidence interval again.

The confidence interval covers the true average height only if (AVE-true mean)*√(n/log log n) < 2/√log log n. But because of the law of the iterated logarithm, this fails infinitely often in our sequential testing scheme above.

Two take home lessons:
# 95% confidence intervals are wrong 5% of the time, and refining them to make them smaller doesn’t change that fact
# approaching sequential testing in a naive way leads, infinitely often, to confidence intervals that are wrong

Technorati Tags:

The evil twin brother of Number Needed to Treat

Some weeks ago I posted an entry on the NNT(Number Needed to Treat), which is essentially the expected/average number which you would have to give a treatment (surgery, pharmaceutical, or device) at the labeled dose/frequency to receive the labeled benefit.

When you are talking about adverse event risk, the number is NNH(Number Needed to Harm), which is the expected/average number who would have to take a treatment at the labeled dose/frequency to receive the noted adverse effect. You want these numbers to be large.

See “here”:http://www.jr2.ox.ac.uk/bandolier/booth/glossary/NNH.html for more info.

(h/t “Pharmagossip”:http://pharmagossip.blogspot.com)

Technorati Tags:

The bar is higher for Democrats

Andrew Gelman (a fellow statistician) and colleagues analyze the probability that the Democrats will retake the House. He has a few interesting insights about the past few elections as well. Remember that Gore won the popular vote but lost the electoral college vote in 2000? There’s been a similar phenomenon in congressional candidates in most elections since 1994. At any rate, realize that the linked article is a bit heavy on the statistics, but you should be able to get the gist.

Merck’s MK-0557 — statistical vs. clinical relevance strikes again

Merck got their “p-value”:http://www.healthday.com/view.cfm?id=535281, but no one cares about losing three more pounds in _one year_ over placebo. And certainly no one wants to take a drug once a day for 365 days to do it. It looks like MK-0557 won’t be making it in the anti-obesity field.

Other fun issues with the drug and its trial:

* 832 out of 1661 enrolled subjects completed the trial (just over 50%)
* A prominent doctor comments

“In my view, this trial suggests not that a cocktail of drugs will be needed, but that for the most part, drugs are not the right answer at all,” said Dr. David L. Katz, an associate professor of public health and director of the Prevention Research Center at Yale University School of Medicine.

“Though rare case of obesity may warrant medication as part of a comprehensive treatment plan, the hope that drugs will save most of us the trouble of addressing weight control through lifestyle practices is misplaced,” he said.

* The target receptor of the drug was completely blocked, but does not appear to have the same biological activity as researchers once thought. Or shutting it down puts a higher load onto other systems. Obesity is a complex issue.
* Except in a relatively few instances, diet and exercise are the best ways to prevent and alleviate obesity.

Technorati Tags: , ,

Slate has a good article on some good consumer critical thinking about statistics

Slate has a good “article”:http://www.slate.com/id/2150354/?nav=ais on how to think about whether you should take a drug. In the confusing world of “relative risk vs. absolute risk”:http://www.randomjohn.info/wordpress/2006/05/03/lying-with-statistics-relative-risk-vs-absolute-risk/, it’s really hard to know the effect of a drug.

Enter the NNT(Number Needed to Treat). The idea behind this number is the _expected_ number of people that you would have to treat so that _one_ person would realize the benefit of the treatment. For example, if the NNT is 3, then you would expect one out of every three people to benefit from the treatment.

Let’s take the Pravachol (a statin, like Lipitor) example from the article. In a 1995 study in ??NEJM(New England Journal of Medicine)??, researchers reported a 31% reduction in the risks of heart attack in men who took one Pravachol every day for five years. 7.5% in the placebo experienced a heart attack vs. 5.3% in the Pravachol group — a 31% relative reduction in risk or a 2.2% absolute reduction. The NNT (see more “here”:http://www.cebm.utoronto.ca/glossary/nntsPrint.htm) is 1/2.2% = 45.5. So you would expect to have to give over 45 men Pravachol once a day for five years to prevent one heart attack. Turned around, we expect that over 44 of them would not avoid a heart attack (either would not experience one any way, or would not be prevented).

I’ll leave all commentary aside about whether drug companies want you to think that way. The data coming from premarketing approval has to be made public (as a certain company just found out), and anyone with a calculator and absolute risk in hand can calculate an NNT.

Slate has a few interesting NNTs:

|_Drug_|_Indication_|_NNT_|
|cortisone|painful shoulder|3|
|amoxicillin|shorten fever for ear infection|20|
|Proscar – 4 yr|Avoid surgery for enlarged prostate|18|
|Aspirin|Avoid heart attack|208|

Think about it. Think about how much you spend each year on some of these drugs, and think about what the chance is they help.

(h/t “insider”:http://pharmagossip.blogspot.com)

Technorati Tags: ,

Ph.D. means Piled higher and Deeper

PhDComics shows us the “difference”:http://www.phdcomics.com/comics/archive.php?comicid=761 between the scientific method and the actual method we use in research.

Technorati Tags:

Lying with statistics in the news

How did I miss this one? You can find many examples of lying with statistics at Stats.org, which seems to be a non-profit associated with George Mason University (the Tar Heel in me says boo-hiss). Given they are a non-profit associated with a university (even if they are a small operation), they have much greater resources dedicated to debunking bad statistics in the media than I do. Of course, my scope is much narrower as well.

Update: via Gelman’s post on “Using numbers to persuade?”, I found this as well.

I find these sites valuable, though I’ve found several arguments that I can’t agree with. For example, in “one of their articles”:http://stats.org/stories/More_Teflon_jan27_06.htm, I’ve found the following:

* _Thus the EPA saw only “suggestive evidence of carcinogenicity.” Seed also noted that “Studies have not shown any effects directly associated with PFOA exposure.”_ Again, this isn’t a conservative statement, and any drug that goes to the FDA with “suggestive evidence of cardinogenicity” would get a much more thorough scrutiny. For something that isn’t a drug, and serves more of a convenience, shouldn’t we give this the same scrutiny?
* _ In other words, the real news in this story is that the EPA and the chemical companies have decided to take an extremely risk averse position on PFOA because of its presence in the environment and blood, but not because there is any evidence as yet to suggest that there might be a genuine risk to humans._ When it comes to preventable disease, who says risk averse is a bad position? If this risk applies only to three people in the United States out of everybody who gets exposed, through non-stick cookware, microwave popcorn, or otherwise, to PFOAs, is our risk averse position unjustified?
* And in “this article”:http://stats.org/stories/Nora_Ephron_Teflon.htm Trevor Butterworth comments the following: _One case of deformity from one person (among thousands) who worked with PFOA is an association that is scientifically meaningless, especially when there isn’t a single health study that has ever shown any such association. This was tabloid journalism at its worst._ This was based on a CBS news story about a woman who worked at a plant with higher than average exposures to PFOAs and who happened to have a birth defect. While not proof of association (and no one with a stats or science degree would make this claim on the basis of one person), it is worrisome and certainly worthy of further investigation. DuPont certainly thought so, as well.

The EPA(Environmental Protection Agency) has information “here”:http://www.epa.gov/oppt/pfoa/.

So yeah, I do find that STATS.org makes a lot of inane pseudoskeptical arguments such as the ones found above. (And they make a lot of good ones as well.) However, they provide a valuable service, and that is to counterbalance a lot of inane misrepresenting/confounding of statistical arguments found in the media.

Bad statistics and science at MDS

The FDA sent a “warning letter”:http://www.fda.gov/cder/warn/2006/MDS_Pharma.pdf to a company called MDS Pharma late last month (posted to the agency’s website yesterday) basically saying that MDS Pharma lied with statistics. (Please note that I have not seen or review the company’s response.) Apparently, the agency found the following issues:

* Studies weren’t appropriately auditible
* The company failed to investigate outliers and other anomalous results.
* Some outliers and other anomalous results were deemed not to matter, but sufficient reasoning was not given.
* They failed to account for differences in some test-retest situations of aberrant results.
* The company used inclusion and exclusion criteria that biased their results.
* The company inappropriately documented their review of studies.
* Calibration points for their standard controls were included and excluded in a biased way and did not follow their procedures, and apparently not standard quality control procedures. (There is a Society for Quality Control for a reason.)
* Some measurement methods they used did not give repeatable results (repeatable means that the results should be very similar under very similar conditions). They took a long time to discontinue the method, but failed to inform the right people that the results coming from the method were unreliable.
* All of the above problems were widespread, over many studies of many products over several years.

And these are _nonclinical_ bioequivalence studies, not even clinical trials. (Bioequivalence studies are designed to confirm that two different formulations of the same active drug reach the sites of action in similar concentrations after they enter the body, and are “proof in the pudding” of generic drug applications.) In clinical trials any of these infractions can be very serious. According to an “article”:http://www.theglobeandmail.com/servlet/story/LAC.20060913.RMDS13/TPStory/Business in ??The Globe and Mail??,This company apparently performs bioequivalence studies for generic drug makers. This sort of behavior will have these consequences:

* You will not be able to determine if drugs are bioequivalent, if you don’t appropriately follow up anomalous responses. This endangers the regulatory submission for the generic drug maker clients.
* “We conclude that you failed to systematically investigate contamination and anomalous results” — inability to appropriately detect and handle contamination can have obvious serious effects if this is done in a clinical trial or post-marketing setup and, in a premarketing nonclinical setup, can skew results so that inappropriate or dangerous dosages are given to human subjects.
* “demonstrate that your retrospective review is capable of discriminating between valid and invalid study data,” — need I go further?

Furthermore, the company left out whole studies when communicating with the agency.

Now, what does this mean for MDS Pharma? I’ll let a securities analyst speak:

bq. UBS Securities Canada analyst Jeff Elliott said in a report that he was “concerned about the timing of a resolution of the FDA issues at MDS Pharma Services and a potential recovery of this unit.”

The markets also speak:

bq. Shares of MDS fell 64 cents or 3 per cent to $19.86 on the Toronto Stock Exchange yesterday.

Shall I issue my own “sell” recommendation (boring disclosure: I don’t own stock in pharma or life sciences companies except perhaps through mutual funds)?

My comment on this is that it doesn’t take a Ph.D. or a familiarity with standard quality control procedures to see the serious nature with these problems. I hope for the sakes of their clients that MDS can salvage some of the studies they’ve done in the last few years, and that no one has suffered due to these problems.

Technorati Tags: