We trust confidence intervals, and they are reported with just about every study in existence. However, they are often wrong, not just because they are wrong 5% of the time but also because they are often incorrectly computed.

In doing applied statistics day after day, it’s difficult to remember all of the theoretical statistics I learned in graduate school. (I got my degree at UNC-Chapel Hill, which is known for its theoretical statistics program.) So I brought home my copy of Van der Vaart’s “??Asymptotic Statistics??”:http://www.amazon.com/Asymptotic-Statistics-Statistical-Probabilistic-Mathematics/dp/0521784506/sr=1-1/qid=1164382703/ref=pd_bbs_sr_1/002-3849726-6409614?ie=UTF8&s=books. It’s a very nice combination of applied and theoretical statistics, with lots of examples and reminders about why the statistical methods we use in the applied world work the way they do. With this study came a stark reminder about some of the dangers in overinterpreting the results of statistical methods.

There’s this very odd looking result called the law of the iterated logarithm: lim sup_{n→∞} (Y_{1} + … + Y_{n})/√(n log log n) = √2 for a sequence of random variables Y_{1}, … with mean 0 and variance 1.

So what does this mean. lim sup means maximum “in the limit,” i.e. what is the maximum value of that strange-looking expression if you ignore the first 10, 100, or even 1,000,000 sums. Now, what about that that strange looking expression? It’s the average of Y_{1}, …, Y_{n}, multiplied by √(n/log log n). If you’ve happened to have gone through a few statistics classes and an advanced calculus class or two, you might recognize that the average goes to 0 (because that’s the mean of all the Y_{n}) so “in the limit” that strange-looking expression is an indeterminate 0×∞. It converges to 0, but infinitely often it’s close to √2.

This is a result that applied statistics don’t use very often, but it still has important implications. I won’t go into the full argument, but an example illustrates how this means that confidence intervals are wrong.

Let’s say that we are measuring the average height in a population (and let’s say there are an infinite number of people). We measure 100 people and construct a 95% confidence interval AVE-2/√n – AVE+2/√n (I use 2 to make this simple, though statisticians might rather see 1.965.) And then say we do a sequential testing scheme: we add another person into the study and redo the average and confidence interval. Then add another person and redo the average and confidence interval again.

The confidence interval covers the true average height only if (AVE-true mean)*√(n/log log n) < 2/√log log n. But because of the law of the iterated logarithm, this fails infinitely often in our sequential testing scheme above.

Two take home lessons:

# 95% confidence intervals are wrong 5% of the time, and refining them to make them smaller doesn’t change that fact

# approaching sequential testing in a naive way leads, infinitely often, to confidence intervals that are wrong

Technorati Tags: statistics

Filed under: How to interpret statistics | Leave a comment »