Statistical commentary on the Geier’s latest paper, Part I


The Geier -brothers- +father and son team+ have “released a paper”:http://www.jpands.org/vol11no1/geier.pdf discussing the link between vaccine preservative thimerosal (a compound containing -m-ethylmercury) and autism. I have been “invited to comment”:http://www.patsullivan.com/blog/2006/03/geiers_link_aut.html on the matter, and have chosen to accept the invitation in a very narrow capacity. That very narrow capacity is specifically on the statistics of the paper and how it relates to the conclusions of the paper. I’m simply not interested in rehashing “old arguments”:http://www.randomjohn.info/wordpress/2005/09/30/thimerosol/ (also “here”:http://www.randomjohn.info/wordpress/2005/10/06/some-thimerosal-references/ and “here”:http://www.randomjohn.info/wordpress/2006/01/27/discover-magazine-on-mercury/). Those discussions take place regularly in other more appropriate forums.

Of course, Orac “offers up his thoughts”:http://scienceblogs.com/insolence/2006/03/the_geiers_go_dumpsterdiving_y_1.php on the matter, and maybe I’ll discuss his objections as well.

First, let me offer up a warning. If you publish a graph be it in electronic or paper form, you might as well publish your dataset. Because I can “get the data back out”:http://www.datathief.org, and, for $25, anyone else can, too. And, in fact, that’s what I did here, at least with the Geiers’ VAERS(Vaccine Adverse Event Reporting System) database graph. While small inaccuracies will inevitably arise from extracting data out of visual representation, the data I got was pretty good for this exercise in armchair statistics.

Second, let me note my methodology. Since I decided to restrict my attention to statistical methodology, I decided to trust the data. Of course, given that I don’t know exactly what QA procedures these data have gone through, there are a lot of holes there to be covered before the data can be completely trustworthy. For example, some people take issue with the VAERS(Vaccine Adverse Event Reporting System), even calling it “corrupt”:http://oracknows.blogspot.com/2006/02/how-vaccine-litigation-distorts.html. Again, I’ll leave those arguments elsewhere in the blogosphere, and save only a few comments at the end to things that are present in the Geiers’ paper.

h3. Examining the VAERS(Vaccine Adverse Event Reporting System) data

So, after reading the paper, I decided to load the VAERS(Vaccine Adverse Event Reporting System) graph into “DataThief”:http://www.datathief.org and output the results into a “text file”:vaersdata containing the reporting date and the number of new reported cases. Then I checked the Geiers’ math by running the simple linear regressions they described in the paper. (I’ll comment on the appropriateness of this methodology later.)

The math checks out ok, and that is also a bit of validation of my “data”:#vaersdata against the Geiers’ data (again, without having verified their methods of extracting and cleaning the data). However, I ran the regression against raw dates (so that a day is an _x_-unit, rather than a quarter). The simple linear regression gives me the following:

Reporting Date Parameter Value SE p R^2
1/94-12/03 Slope (change in new cases) 0.01425 0.002 <0.0001 0.60
1/03 – 8/05 Slope (change in new cases) -0.0302 0.01074 0.0156 0.3977

I must admit, there are several things that bug me about this analysis. First, check the reporting dates again. They overlap. The Geiers do justify this by saying

bq. The reporting quarter periods were defined so as to overlap slightly, to maximize the possibility of capturing the peak reporting period in both groups.

In allowing the reporting periods to overlap, they destroyed any possibility of a meaningful relationship between the two slopes. I’ll have to go with the critics of the study on this point and say that this is a bizarre analysis.

So what is a better analysis to do here? I can think of a couple.

First, if we wanted to retain the simplicity and easy interpretability of the two-piece linear regression, we could run regressions on two nonoverlapping areas. To top things off, we could show that the two-part regression explains the data better than a one-part simple linear regression by using a nested _F_ test or likelihood ratio test. For even more fun, we can show that a two-part model is more appropriate than a three-part model.

In the presence of serial correlations, such as “autocorrelation”:http://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm one of the time-series approaches, such as change point analysis using “ARIMA(Autoregressive Integrated Moving Average)”:http://www.itl.nist.gov/div898/handbook/pmc/section4/pmc446.htm, is more appropriate.

Finally, a flexible regression method such as “LOWESS(Locally Weighted regression Smoothing Scatterplots)”:http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm can inform changes in slope of the data.

So let’s try some of these methods!

First, I did a LOWESS(Locally Weighted regression Smoothing Scatterplots) with smoothing parameters. I plotted the data along with the LOWESS(Locally Weighted regression Smoothing Scatterplots) prediction and 99% confidence limits. Here are the results if you’re interested:

* “Smoothing parameter 0.2 (about right)”:http://www.randomjohn.info/wordpress/wp-content/uploads/2006/03/gplot.gif
* “Smoothing parameter 0.6 (a little oversmoothed)”:http://www.randomjohn.info/wordpress/wp-content/uploads/2006/03/gplot1.gif

Choosing smoothing parameters is a bit of an art, though there are some techniques to help along the way. Rather than quibble over this point, I would like to note the essential features of the graphs.

* The number of reported new cases stays flat until sometime in 1999.
* At that time in 1999, the new cases starts to rise until early 2003.
* From early 2003 until the last reporting date, the number of new reported cases falls.

I’m starting to suspect that if we went with the piecewise linear model as the Geiers (sort of) did, we’d have to go with the three-part model. These numbers were created by using a segmented linear model using the NLIN procedure in “SAS”:http://www.sas.com.

Model # parameters MSE(Mean Square Error) Change in MSE(Mean Square Error) from simpler model F p (from _F_(2,39) distribution)
Linear regression 2 (slope, intercept) 165.02994
Two-part linear regression (change at Jan 2003 like the Geiers’ model) 4 (2 slopes and intercepts) 134.1 30.93 0.402 0.672
Three-part linear regression (changes at Jun 1999 and Jan 2003) 6 (3 slopes and intercepts) 76.9536 57.1464 0.743 0.482

The resulting _F_’s were compared against _F_(2,39) because each stage involved adding two parameters, and the most complex model was taken to be the three-part model, whose MSE had 39 degrees of freedom (45 data points – 6 parameters). Also note that I didn’t constrain the model to be continuous. When I do a real change point analysis in Part II, I’ll fix that.

Of course, this is pretty much just fancy playing around with the data at this point. The point here was to show that doing a piecewise linear regression is fraught with danger because it is so flexible. You can select change points arbitrarily, and this is where I get into even fancier statistics.

And this is where I arbitrarily cut off Part I of my statistical review of the Geiers’ paper. Look for Part II fairly soon, where I examine flexible changepoints (heh heh, lots of fun) and maybe also look at time-series analyses (though I might save that for Part III).

*Conclusion so far*. I’m afraid it’s not looking good for the Geiers here. Obviously _something_ is going on, and I don’t think that the Geiers’ analysis adequately explains the upturn in newly diagnosed autism cases in the VAERS(Vaccine Adverse Event Reporting System) around mid-1999 and the downturn at around the beginning of 2003.

Before getting your engines revved up one way or the other, it’s important to realize that many factors could have gone into this phenomenon:

* Increased awareness (addressed in the Geiers’ article, and I’ll let others debate the merit of that argument)
* Lags from the time vaccinations were rearranged to diagnosis age
* Changes in pollution, not even addressed by the article
* Interactions among any of the above, and any other issue people can come up with

Again, I don’t think we’ve seen an adequate explanation.

h3(vaersdata). The VAERS(Vaccine Adverse Event Reporting System) Data

This is the number of new autism cases in the VAERS(Vaccine Adverse Event Reporting System), as derived from Figure 2 of “Geier and Geier, 2006”:http://www.jpands.org/vol11no1/geier.pdf.

| *Reporting Date* | *New cases in VAERS(Vaccine Adverse Event Reporting System)* |
|27-Mar-1994|0.2032|
|28-Sep-1994|12.3928|
|27-Dec-1994|4.0632|
|26-Mar-1995|0.2032|
|23-Jun-1995|0.2032|
|21-Sep-1995|3.0474|
|1-Jan-1996|0.2032|
|30-Mar-1996|2.2348|
|27-Jun-1996|4.0632|
|24-Sep-1996|4.0632|
|23-Dec-1996|0.2032|
|22-Mar-1997|6.298|
|19-Jun-1997|4.0632|
|23-Sep-1997|5.079|
|28-Dec-1997|8.3296|
|27-Mar-1998|4.0632|
|1-Jul-1998|1.219|
|28-Sep-1998|4.0632|
|26-Dec-1998|1.219|
|25-Mar-1999|8.3296|
|23-Jun-1999|5.079|
|20-Sep-1999|14.2212|
|31-Dec-1999|9.1422|
|29-Mar-2000|9.1422|
|27-Jun-2000|16.2528|
|24-Sep-2000|23.1603|
|21-Mar-2001|20.1129|
|25-Jun-2001|21.1287|
|22-Dec-2000|28.0361|
|22-Sep-2001|32.0993|
|27-Dec-2001|30.2709|
|26-Mar-2002|38.1941|
|26-Dec-2002|40.2257|
|27-Sep-2002|76.9977|
|22-Jun-2003|58.307|
|26-Sep-2003|19.3002|
|24-Dec-2003|15.0339|
|29-Mar-2004|23.1603|
|23-Sep-2004|20.1129|
|28-Dec-2004|14.2212|
|25-Jun-2005|33.1151|
|26-Jun-2004|38.1941|
|25-Mar-2003|42.2573|
|24-Jun-2002|52.2122|
|27-Mar-2005| 19.909|

Technorati Tags: , ,

Advertisements

7 Responses

  1. John, remember, your readers aren’t as smart as you. (Well, I can only speak for myself.) I don’t speak Statis-ese, but I’m trying to learn.

    Seriously though, thanks for digging into this. No matter where the data points us, let’s follow it!

    Btw, I think for you to retain your credibility, you’re going to have to remain as unpaid consultant. So, my apologies for the $25 you invested based on my suggestion. I’ll buy you lunch next time you come to Arizona. 🙂

    – Patrick Sullivan Jr.

  2. Oh yeah, quick clarification: David Geier is Dr. Mark Geier’s son. (Geez, did you really even read their report? 😉

  3. Oops. _Mea culpa_. I’ll start paying attention to things other than the statistics.

    And the Datathief will definitely get used on other things. 😉 I don’t consider it $$$ wasted, so don’t worry.

  4. […] In Part I, I discussed how Mark and David Geier in their paper used a bizarre analysis of VAERS data to support their conclusion that the removal of thimerosal from vaccines resulted in a decline in cases of newly diagnosed/reported autism. I concluded that their incorrect analysis did not really support their conclusions. I stand agnostic on their conclusions. […]

  5. […] In Part I, Part II, and Part III of this series I discussed the statistical methodologies in the recent paper by Mark and David Geier, who extracted data from the VAERS and the CDDS and tried to show that efforts to remove the compound thimerosal from vaccines have resulted in a decrease in new autism cases (and other neurodevelopmental disorders). In Part I, I concluded that their statistical methodology was invalid and unable to support their conclusions. In Part II, I suggested that they could make their point more soundly by employing better, time-series-related methodologies. In Part III, I briefly examined their CDDS data, and concluded that the methodology was invalid, and correct methodology did not back up their claims. In this final part, I examine data quality issues and wrap this series up by examining a few other criticisms of their work. […]

  6. […] In Part I and Part II of this series I discussed the statistical methodologies in the recent paper by Mark and David Geier, who extracted data from the VAERS and the CDDS and tried to show that efforts to remove the compound thimerosal from vaccines have resulted in a decrease in new autism cases (and other neurodevelopmental disorders). In Part I, I concluded that their statistical methodology was invalid and unable to support their conclusions. In Part II, I suggested that they could make their point more soundly by employing better, time-series-related methodologies. In this part, I take a brief look at their CDDS data. I did try the odd linear regression techniques of the Geiers again, just to check things. I got similar results. […]

  7. […] I enjoy doing statistics, and do it both for pay and fun. Recently I did a four-part series critically looking at the statistics done by a certain Mark Geier, MD and David Geier, who examined two databases and found a link between the removal of methylmercury-based thimerosal from vaccines and drops in reports of new autism cases. I concluded that their methodology was not valid and that correct methodology might make their case stronger. Parts I-III were a bit heavy on the statistical analysis (“in Klingon” as one blogger put it). […]

Comments are closed.

%d bloggers like this: