An excellent article by Michael Pollan in this week’s Times magazine has relevance to the subject I had already picked for my next post: how bad science and standard journalism practices contribute to a raft of contradictory, misleading, and often worse-than-useless nutritional and medical advice. He explains how another important facet of the problem is purely political, but touches more briefly on the issues to do with bad science. I will leave the politics to him and focus on the science. The media’s role should be fairly obvious already.
What is the bad science? For an example I will use a paper whose topic has been publicized recently by the media: Corrada, et al. 2005. Reduced risk of Alzheimer’s disease with high folate intake: the Baltimore longitudinal study of aging. Alzheimer’s & Dementia 1:11-18.
This paper is first a perfect example of the reductionist approach to nutrition that Pollan discusses in his article. How do the authors identify folate as a nutrient associated with Alzheimer’s? They first use self-reported diet and supplement data, which is notoriously bad in the first place (I can’t help but add, probably even worse in a study which assumes a certain proportion of the participants will be having dementia during the study). Then, one author estimates the level of each of 6 nutrients of interest (vitamins C,E,B6,B12, folate and carotenoids). For some reason I can’t yet figure out, they then decide whether each person took more or less than the U.S. R.D.A. for each nutrient. The relative risk for Alzheimer’s is then calculated with a particular type of regression that I have not used, but let’s assume that part’s fine.
This procedure provides the following summary of results: the calculated relative risk (RR) for folate is .41, supposedly meaning that if you imbibe more than the RDA of folate over a number of years, your risk for Alzheimer’s is only 41% of that of someone who takes less than the RDA.
Both vitamins E and B6 showed a tendency to reduce risk as well, so the authors reran the data in a test using these nutrients only, designed to confirm the importance of these vitamins and find out if RR was correlated among these three. They found some correlations but decided it didn’t matter, and determined that folate was the nutrient they were looking for.
There are (at least) two major statistical problems with the paper. The first is common to a lot of scientific papers, especially medical ones, which doesn’t make it any less wrong: using the wrong significance level. The scientific standard is to find two different groups of data different if there the chance they came from the same population is less than 5%. This is a completely arbitrary standard, but considered conservative enough to be accepted by the scientific community. Trouble is, every time you run a statistical test, there is a 5% chance that data that look different really aren’t. Simply, this means that if you run 20 statistical tests as part of your analysis, then on average one of them is a false positive. This is why I do not panic when I do a free blood workup every other year offered by our insurance, and it always says that one of the things it is testing for (cholesterol, iron, etc.) is out of the “normal” range – if you test over 20 things and use 95% confidence intervals for where they should be, then that’s what happens.
The paper does 6 initial analyses so at a minimum, the significance level should be set to 5%/6 = .833%. By this standard only folate shows any activity in the orginal test. But then they do a whole lot of other comparisons – they take each vitamin and break it into smaller groups and compare different levels of each to see what levels are associated with Alzheimer’s. In the first table alone, there are a total of 18 statistical comparisons. Bottom line: if you fish for enough stuff for a long time, you’re bound to catch something. (This also explains why all these “breakthrough” studies are so ridiculously contradictory when you put them all together – which the authors of this paper themselves admit in their discussion – and why all these nutrition fads come and go with such rapidity.)
The next statistical problem is that when they find 3 nutrients of interest, they reuse the same data set to confirm the suggestion that one or more of these are actually having an effect. This breaks a cardinal rule of statistics. If a data set gives you the idea that there is a certain effect, using the same data set to confirm the effect is basically self-confiming prophecy. How do you deal with this problem when the data are from a long-term study? You don’t want to have to wait 20 years to confirm your suspicions. Fortunately, you don’t have to. Before the study begins, you arbitrarily break your group into two. Then when doing the analysis, you use one group to hunt for any nutrients that might be doing something, and you use the second group to test your (now formulated) hypothesis that a particular nutrient is significant.
And by the way – the actual numbers of the study show that those taking more than the RDA of folate had a 7% chance of getting Alzheimer’s, while those taking less had an 11% chance. Not exactly earth-shattering.
None of this touches on the idiotic notion that single nutrients are somehow acting in a vacuum to prevent or cause disease. But the media is just as in love with this idea as “nutritionists” are. Read Pollan’s article. It will be obvious how this all fits together to create a multibillion-dollar supplements industry for stuff we feel we need because we are eating so much crap produced by the multibillion-dollar processed food industry.
These studies are a waste of time and money.