13 Comments
User's avatar
ScienceGrump's avatar

Great deep dive into how bad science operates in action. Unfortunately, this kind of thing is more common than not for claims about lifestyle and health. But here is an especially egregious case of a narrative overpowering the data. The only finding worth following up is... that vegetarians have 3x greater risk of esophageal cancers than meat eaters.

ScienceGrump's avatar

On a technical note, I will say that requiring an FDR < .05 is usually not appropriate in the way requiring small p-values is. If the FDR is well-calibrated, that means you're guaranteeing 95% of results can't be explained by the null - a *very* stringent standard of confidence. If it's not calibrated, no threshold can be considered safe.

Adam Rochussen's avatar

It's a fair point. Though the B-H FDR and the unadjusted p-value are directly related, so a small p-value will create a small FDR.

FDR = p-value * total tests / p-value rank

(and then make sure the sequence of p_adj is non-decreasing)

And then what that actually means is that, with the three results in this study that were FDR<0.05, we can say that 5% of those findings are likely to be false positives. You're right that it can be seen as stringent, but I guess that's a subjective take. It's certainly less stringent than other multiple comparisons adjustments (e.g. Bonferroni). For this study, where lots of the associations don't really have much mechanistic rationale, they're basically hypothesis-fishing to then justify follow-up research. For exploratory research like that, I think a pretty stringent FDR cutoff is very necessary.

I'm not entirely sure what you mean by the calibration of the FDR? But certainly this adjustment all depends on assumptions of test independence being true etc (which may not be the case for multiple cancers in different parts of the body). The sensitivity analysis can therefore be a better robustness check, but that too filtered out the big headline results lol.

ScienceGrump's avatar

What I mean by miscalibration is really a misspecified null: you have some mathematical function that doesn’t match the null hypothesis you claim to be testing. This can happen in lots of ways. Correlation is a very big one. But you might also assume that errors are normal when they are kurtotic. In any case where authors try to “correct” for cofactors, there are specific assumptions about the ways covariates can influence with the variable of interest. Most often, it's just failing to account for a source of error altogether.

I agree it is a big problem if the correction is on a bunch of ad hoc hypotheses. BH only works if the hypotheses are systematic and unbiased. But in that case I don't trust an FDR < .05 either

Adam Rochussen's avatar

Gotcha. Makes sense, thanks!

Adam Rochussen's avatar

Thanks! Agreed. It's super common in fields where proper causal tools are limited (nutrition, climate science, sociology etc).

☔Jason Murphy's avatar

I may be weird but I get my Science news by scrolling PubMed, sorting by trending.

So I read the original paper and was not that surprised to see vegetarians getting more cancer. I was vego myself for a number of years and it leads to eating a lot of processed food!

One side-effect of scrolling PubMed is you get a strong sense of how limited science is. Lot of tiny mechanistic studies in vitro. A handful of randomised clinical trials - far fewer than you'd expect - mostly delivering very marginal benefits. Loads and Loads of Reviews. Most work being done on cancer (that's where the money is). And most published papers simply not being worth even a glance.

Adam Rochussen's avatar

I don’t think I’ve ever heard of anyone doing this 😂

Makes sense. You’ll get a good cross-section. You’re right about the absence of good science. Even many RCTs are really badly designed to the point that they’re basically worthless. Nutrition science seems to be particularly stained by this.

watchdominion.com's avatar

Re: Except there is nothing causal to be inferred here at all. Zero. These are purely observational data. Correlation does not imply causation.

When you say "nothing causal to be inferred", do you account for different credences? For example, prospective cohort studies (even nutrition) have strong agreement with RCTs, so I can make a more limited causal inference as I would if I was reading a RCT.

Also, how do you know confounding variables are confounding, considering that we only consider them confounding based on observational data? If we claim something is confounding, we are making a casual claim. Do we have e.g. RCTs on these variables?

Adam Rochussen's avatar

Perhaps “nothing causal to be inferred here” is too strong. Certainly nothing causal is proven because it wasn’t directly tested. I guess anyone is welcome to make inferences from observational data. And some aspects of observational data can make such inferences slightly more likely to be accurate (eg dose-response relationships).

I disagree that identifying a confounder equals “making a causal claim”. A correlation can be confounded by a third variable. Perhaps this is just a semantic disagreement?

And we don’t need RCTs on third variables to know if it is a confounder. Simply adjusting for the covariate reveals if it confounded the original correlation or not.

watchdominion.com's avatar

I think what it means for e.g. smoking to confound observational data on the healthiness of drinking alcohol is that smoking is associated with drinking alcohol and lays in the casual chain for some of the outcomes observed in the alcohol group. If it didn't, then it would not confound the data.

I understand we can adjust for a covariate to reveal its confounding, but why did we adjust for that particular covariate to begin with? Why would we adjust for smoking? If it's simply associated with drinking, that doesn't mean it can cause any of the outcomes we measured, so it doesn't mean it can confound. Are we just confounding because it is a covariate? I can imagine a lot of spurious covariations. Why not adjust for sunscreen usage, or tattoo region, or icecream preference, if those covary? Is there a p-hacking equivalent for adjusting for everything that covaries until we find a confounder?

Or maybe there is a principled way of sorting spurious covariation from plausibly confounding covariation, like all of our observational data on smoking.

Adam Rochussen's avatar

Yeah I think adjusting for things that don’t need to be adjusted for, or not adjusting for things that should be adjusted for, basically amounts to p-hacking. It’s true you need a hypothesis of a causal chain to decide what to adjust for etc. But in terms of proving causality, observational studies can never achieve that.

watchdominion.com's avatar

I don't think RCTs can "prove" causality either, only provide compelling evidence. I think setting the bar as proof is a bit of a strawman.

As long as a prospective cohort study adjusts for known confounders (also only known to confound through observational data), then I think we can make a good inference to causation thanks to a high prior probability that the results will agree with a RCT and almost guaranteed to agree at least directionally.