Haidt's Mirage
How Jonathan Haidt's misunderstanding of the DGP underlying social media and mental health among youth has him seeing effects that aren't really there
tl;dr
Using a defensible model of the data generating process (DGP) and simulation, I show how easy it is to come up with significant SM→MH effects (especially among girls) that are pure mirages—i.e., effects that only appear because your analysis fails to adequately capture the true DGP.
How to Build a Mirage in Three Easy Steps
Haidt’s understanding of the DGP appears to be something like this diagram (above). Gender is a common cause of both SM use and MH, so failing to account for it results in biased estimates of the SM→MH effect. Studies indicate that girls report more SM use and MH problems than boys, so this seems plausible. However, Haidt argues that gender is not just a confounder of the SM→MH effect but, more importantly, a moderator. That is, the magnitude of the effect of SM use on MH depends on whether you’re a boy or a girl (depicted here by the dashed orange line). Again, this seems plausible, as numerous studies have (generally) shown a stronger SM→MH effect among girls.
However, adjusting for gender (and age) still leaves the SM→MH effect prone to residual confounding by what I’m calling “Bad Stuff,” which is a dramatically over-simplified catch-all term for things occurring at the individual (e.g. MH history), family (e.g. abuse/neglect), community (e.g. poverty), or societal level (e.g. politics). Plenty of critics have pointed this out–most recently Candice Odgers. Haidt disputes these critiques, saying they don’t hold up to scrutiny. For instance, how could the proliferation of school shootings—a uniquely American phenomenon—explain rises in teen mental distress in Canada, the UK, or Australia? A lot of nuance is lost on both sides of this debate. Bad Stuff varies within and between societies, communities, families, and individuals. The Bad Stuff that potentially confounds the SM→MH effect may not be the same across countries. It’s complicated.
Instead of diving into the depths of those murky waters, my goal with this post is to show how easy it is to obtain significant SM→MH effects (especially among girls) when you fail to properly model the underlying DGP. My proposal for what that DGP looks like is above. It’s identical to the diagram above except for the addition of a 2nd orange dashed arrow indicating that the SM→MH effect depends on Bad Stuff (as well as gender).
This model of the DGP is obviously imperfect, as all models are, but credible (in my view). I already covered how the female→SM, female→MH, and female*SM→MH (moderation by female) effects are defensible, given the evidence. For the remaining effects, I think few would argue with the notion that Bad Stuff has an effect on MH and that women/girls just generally have it harder than men/boys (female→Bad Stuff). For the Bad Stuff→SM effect, evidence suggests that various kinds of Bad Stuff (e.g., psychological distress, family dysfunction, etc.) influence how and how long young people use SM. And for the important, but often overlooked, moderating effect of Bad Stuff: multiple studies (eg https://rdcu.be/dFntV) indicate that the SM→MH effect depends not only on factors related to use (e.g., frequency, duration, platform, type of use, etc.) but also factors related to the user (e.g., personality/MH, reasons for use, etc.) and the context. To illustrate Bad Stuff’s role, let’s say that Bad Stuff is a traumatic event. The trauma leads to increased SM use (Bad Stuff→SM) and alters the magnitude of the SM→MH effect by turning what would typically be a null effect (in the absence of the traumatic event) into a negative effect (i.e., the traumatic event causes them to experience their SM use in a way that increases MH problems).
To start out, I simulated data matching the DGP above (left). The numbers next to the paths are the specified standardized effect sizes. Note that the two focal effects, the main effect of SM Use on MH Problems and the interactive effect of Female*SM on MH Problems, are specified to be zero. So, in a world where this is the underlying DGP, a correctly specified model should find that the SM→MH effect and the Female*SM→MH effect are near zero (leaving aside the additional complications of statistical power and all that jazz). However, I purposely fit a misspecified model (above right). This misspecified model, where the main effects of SM and Female and the interactive effect of SM*Female are included but Bad Stuff is not, is akin to what Haidt is doing when he presents data showing the correlation between SM use and MH problems among girls vs. boys.
So what happens when you fit this misspecified model to the data? Well, you get a sizable main effect SM→MH of β = 0.25 and a small, though noticeable, interaction effect Female*SM→MH of β = 0.04. My next question was, “what happens when you include Bad Stuff in the model as a control variable but not as a moderator?” So the misspecified model (above) is now almost identical to the DGP, it’s just missing the Bad Stuff*SM interaction effect. Surely this is good enough to recover some unbiased effects, right? Well, not exactly. Though the inclusion of Bad Stuff did shrink the SM→MH main effect down to zero where it should be, the interaction effect Female*SM→MH actually increased to β = 0.06. So, if we were like Haidt we might think to ourselves, “When I controlled for the confounding effects of Bad Stuff, the SM→MH effect among girls got bigger. This only bolsters my claim that SM is particularly harmful for girls.” But, no, it’s just a mirage.
This finding made me wonder, “How does the bias in the Female*SM→MH interaction effect vary across different parameter specifications?” So I did a multiverse analysis where I fixed the SM→MH and Female*SM→MH effects to zero, the Bad Stuff→MH effect to β = 0.5, and allowed the remaining parameters to vary across combinations of [0, 0.1, 0.2, 0.3]. This resulted in a grid of 1024 specifications. I then generated data using my DGP for each specification and fit the “more robust” misspecified model that included everything but the BadStuff*SM interaction to each simulated dataset and extracted the focal effect–which in this case was the Female*SM→MH interaction effect. Finally, I put everything into a specification curve plot (below), which visualizes how the bias in the focal effect varies across combinations of parameter specifications. Notably, the amount of bias in the Female*SM→MH interaction effect seems to depend mostly on the magnitude of the Female→Bad Stuff and SM*BadStuff→MH effects, while the other effects don’t appear to matter as much. I find this to be significant because it suggests that you can still end up with a substantially biased Female*SM interaction effect even when the effect of Female→SM, Bad Stuff→SM, and/or Female→MH is zero. This means that there are many different ways that you could end up with a (substantially) biased estimate of the focal effect!
Obviously, it’s completely possible that I’ve made some error in reasoning or implementation, or both. So I welcome your feedback. You can check out the code I used to generate the simulations and plot here. Feel free to experiment with your own parameter specifications and/or versions of the DGP. If you do, please report back–I’m curious what you find.
I do all this not because I’m secretly a shill for Big Tech (I’m not)—I actually agree that it’s probably not great to give 12-year-olds unfettered access to the internet/SM—but because it’s critical that we focus our time and energy on policies and interventions that will actually have a positive impact. I disagree with Haidt’s more extreme recommendations that call for wide-reaching government interventions that equate to banning SM and, like I wrote about previously, believe these policies will do more harm than good. If my understanding of the DGP is correct (or at least correct enough), then a ton of attention and resources will be expended on something that will have no impact on youth MH. Attention and resources that could be better spent on things that could actually have an impact.