Lying with Statistics Part IV: the well-chosen measure and subpopulation

Through a happy coincidence, I get to base a new Lying with Statistics entry on a current event. It seems that the connection between job stress and chronic hypertension is tenuous. While the medical result is interesting in its own right, a review of some of the studies reveals some ways in which statistics can be misused or abused (or simply downright tortured) to prove a point opposite of what conclusion any reasonable interpretation would recommend.

First up is the well-chosen measure. In some of the questionable studies, particular measure of job stress were emphasized, and measures that happened to de-emphasize a correlation between stress and chronic hypertension were not emphasized. In two of the studies, a discussion of systolic blood pressure was omitted entirely! (Systolic blood pressure is an important part of the hypertension discussion.)

Second up is the well-chosen subpopulation. Several of the studies showed no overall correlation but focused instead on subpopulations that did happen to show a correlation.

The reason these practices are bad are the same as the ideas discussed before: the mind is able to pick up on patterns once the data is in, and if you do a hypothesis test that is chosen after the data is in and is based on your view of the data, chances are that the test will confirm the “pattern” your brain picked up. The probability of a false positive is much higher than what you want (usually 5%), and how much higher is generally unknown.

The antidote, of course, is to prespecify your hypothesis test, and then execute as you prespecified. (Or make sure that a major or primary result of a journal article you read is a result of a planned test.) Treat all tests that are generated after the data is in as exploratory and, at best, worthy of further pursuit.

Note that not all subpopulation analyses or analyses of specific measures are bad. Ones that are prespecified, of course, are worthy of some confidence. Odd results in a subpopulation analysis may lead to new discoveries. However, be especially skeptical if you come across the following situations (and the more of these you find, the more suspicious you should be):

* no evidence that a subpopulation or measure was specified before data was collected
* the overall population did not show the same result as the subgroup analysis, or results across subgroups are inconsistent
* results across measures of the same thing (e.g. surveys of job stress) are inconsistent

Finally, it’s also important to realize that the motivation behind the well-chosen measure or subgroup isn’t usually malicious (although it may be!).

%d bloggers like this: