Statistical commentary on the Geiers’ latest paper, Part III

In “Part I”: and “Part II”: of this series I discussed the statistical methodologies in the recent paper by Mark and David Geier, who extracted data from the VAERS(Vaccine Adverse Event Reporting System) and the CDDS(California Department of Developmental Services) and tried to show that efforts to remove the compound thimerosal from vaccines have resulted in a decrease in new autism cases (and other neurodevelopmental disorders). In Part I, I concluded that their statistical methodology was invalid and unable to support their conclusions. In Part II, I suggested that they could make their point more soundly by employing better, time-series-related methodologies. In this part, I take a brief look at their CDDS(California Department of Developmental Services) data. I did try the odd linear regression techniques of the Geiers again, just to check things. I got similar results.

I didn’t bother with the two- and three-line piecewise linear regression. I went straight for the real analysis. But first, a plot of the smoothed data.

Smoothed Scatterplot of CDDS data

And ACF(Autocorrelation function) and PACF(Partial autocorrelation function) plots:

ACF of detrended CDDS data

PACF of detrended CDDS data

I find these rather surprising, because they suggest no autocorrelation at all. So, I just tried a linear regression again, and this time I tried something a little simpler than the segmented model from Part I. I created a 0-1 variable based on whether the date is after 1-Jan-2003. Then I ran a regression with just the date and one with both the date and the indicator. Again, I tried a nested models analysis, which is actually valid in this case. So here goes:

Model MSE F-value p-value
Date only 13848
Date and change at 1-Jan-2003 14163 ??? ???

Well, I don’t often see where the finer model has a higher MSE than the coarser model, but, given that the _F_-value is 0.96 in the ANOVA(Analysis of Variance) table, I guess weird things can happen. At any rate, the two-part piecewise linear model is likewise unjustified and in this case I would just go either with a local regression (as above) or a simple linear regression.

*Conclusion*: Again, the statistical methodology employed by the Geiers is unjustified and does not support their conclusion. Correct methodology in this case does not support their conclusion, either.

h3. The Data

Date New Cases
21-May-1995 43.3269
21-Feb-1995 176.0155
9-Sep-1995 311.412
4-Feb-1996 238.2979
7-Nov-1995 146.2282
3-May-1996 170.5996
1-Jul-1996 121.8569
21-Oct-1996 230.1741
17-Jan-1997 232.882
9-Apr-1997 251.8375
29-Jun-1997 216.6344
19-Sep-1997 268.0851
17-Dec-1997 389.942
15-Mar-1998 305.9961
12-Jun-1998 254.5455
2-Sep-1998 352.0309
22-Nov-1998 343.9072
8-Aug-1999 235.5899
12-May-1999 389.942
12-Feb-1999 468.472
26-Jan-2000 435.9768
14-Jul-2000 427.853
11-Oct-2000 417.0213
30-Mar-2001 457.6402
24-Dec-2000 487.4275
5-Nov-1999 595.7447
16-Apr-2000 492.8433
27-Jun-2001 568.6654
9-Sep-2001 704.0619
14-Dec-2001 663.4429
3-Jun-2002 660.735
6-Mar-2002 706.7698
23-Aug-2002 812.3791
12-Nov-2002 847.5822
9-May-2003 731.1412
6-Aug-2003 834.0426
17-Feb-2003 1191.4894
3-Nov-2003 831.3346
16-Jan-2004 790.7157
21-Apr-2004 676.9826
12-Jul-2004 793.4236
1-Oct-2004 725.7253
22-Dec-2004 752.8046
20-Mar-2005 809.6712
25-Jun-2005 736.5571
7-Sep-2005 736.5571
5-Dec-2005 679.6905
%d bloggers like this: