The concept of nudges has proven extremely popular since Richard Thaler and Cass Sunstein published their book of the same name in 2008. Over the past 14 years, hundreds of studies have been published that could be classified as test nudges ( although the term has been applied quite loosely).
In January 2022, the magazine PNAS published an attempt to combine these boost studies into a meta-analysis. This analysis concluded that the studies had an overall effect size of D = 0.43. That’s pretty substantial – it’s about the same as the effect of interventions to increase motor skills in children or the effectiveness of web-based stress management interventions.
The study also cautioned that there was evidence of publication bias. This problem, which affects many scientific disciplines, is one where academic studies are systematically more likely to be published if they show a particular result. In this case, “the true effect size of interventions is likely to be smaller than estimated,” the authors write.
Last week, PNAS published a response to this study that went much further. He claimed that correctly correcting for publication bias actually eliminates any overall effect of the nudges. The short article is titled “No evidence of boost after adjusting for publication bias,” and some people have used it as the basis for a blunt rejection of the boost idea as a whole.
The tendency to do so may be understandable, given the attention the idea has garnered. But the real situation is more complicated, interesting and hopeful than this wild swing between “great effect” and “no effect” suggests.
The real situation is more complicated, interesting and hopeful than this wild swing between “great effect” and “no effect” suggests.
It is quite clear that there is a publication bias. Indeed, my first reaction to the original meta-analysis was that the overall effect size was “unbelievably large”, as another review put it. However, there is one crucial element missing from the story I have told so far: we have strong evidence that these types of interventions do have real effects in real contexts. Evidence where publication bias has been removed from the table.
Around the same time as the original meta-analysis, the main review Econometrics published a unique study of work conducted by two leading behavioral science organizations – the US federal government’s Office of Evaluation Sciences and the US office of the Behavioral Insights team, where I work. The study was unique because these organizations provided access to the full universe of their trials, not just those selected for publication.
Across 165 trials testing 349 interventions, reaching more than 24 million people, the analysis shows a clear and positive effect of the interventions. On average, the projects produced an average improvement of 8.1% across a range of policy outcomes. The authors call this “important and highly statistically significant” and point out that the studies had better statistical power than comparable academic studies.
Real-world interventions therefore have an effect, regardless of publication bias. But I want to use this study to disrupt the “do nudges work?” debate, rather than piling it on one side or the other. Let’s start.
We can start to see the biggest problem here. We have a simplistic, binary “works” versus “doesn’t work” debate. But this is based on grouping a huge range of different things under the “boost” label and then associating a single effect size with that label.
An important piece of context is that both organizations mostly limited themselves to low-cost, light-weight interventions during the time period studied. BIT Americas was specifically commissioned to conduct rapid and inexpensive randomized trials, for example. Cheap, scalable changes that produce these kinds of improvements are valuable to those making decisions in the public and private sectors. Hence the growth of organizations offering them.
But behavioral science interventions can also have greater impacts at a broader structural level. Some of them can be seen as nudges, such as when default changes produce systemic effects in the public and private sectors. Others concern the application of behavioral scientific evidence to improve fundamental policy decisions in regulation, taxation or macroeconomic policy. These examples of behavioral public policy get less airtime, in part because the presentation of specific experiences is sharper and more compelling.
We can start to see the biggest problem here. We have a simplistic, binary “works” versus “doesn’t work” debate. But this is based on grouping a huge range of different things under the “boost” label and then associating a single effect size with that label. The studies involved cover a bewildering range of interventions, settings and populations. Even seemingly similar interventions will have been implemented in different ways – and we know that these choices have significant impacts.
In other words, we have a lot of heterogeneity in attempts to influence behavior. Effects vary by context and within groups. Interestingly, both pro and anti-nudge articles admit it, but it’s the headlines about average effect size that dominate. This fact reinforces calls for a “heterogeneity revolution,” where we accept that effects vary (because behavior is complex) and stop thinking that only the overall effect matters.
Such a change would require us to temper the claims (on both sides), make them more specific, and start a more sophisticated conversation. But we can only do this if we seriously seek to understand why certain things work in certain places. And the good news is that we have clear paths to follow.
A much more productive framework is to view behavioral science as a lens used for large-scale investigations, rather than a specialized tool only suited for certain jobs.
We can use machine learning to produce new, more reliable and accurate analyzes of how effects vary by context and group. We can also produce meta-analyses more narrowly focused on specific interventions or contexts (rather than a bag of nudges), such as the effect of defects on meat consumption. To do this, we will need better ways to categorize studies or use the more rigorous ones that already exist. And we can gather more systematic knowledge about how design and implementation choices interact with context, including how we can successfully adapt interventions to new contexts. Implementation science already knows a lot about this.
In this perspective, we can see the latest publication in PNAS as a marker in the evolution of applied behavioral sciences: more like a half-time buzzer than a “the death knell”. The current phase has been dominated by binary thinking that misleads and does not serve everyone. Large and consistent effects versus no effect. The “individual framework” versus the “system framework”. Thumbs up against “more important solutions”. Like Jeffrey Linder the dish“Saying ‘thumbs work’ or ‘nudges don’t work’ is as meaningless as saying ‘drugs work’ or ‘drugs don’t work’.”
The ILO will soon publish a manifesto which will try to get us out of this impasse; one of the main proposals is that a much more productive framework is to view behavioral science as a lens used for wide-ranging investigations, rather than a specialized tool only suited for certain jobs. Among many other proposals, we will also discuss how to understand how effects vary by context and group. We hope it becomes part of a more nuanced and informed conversation about behavioral science.
Disclosure: Michael Hallsworth is a Fellow of BIT, which provided financial support to Behavioral Scientist as a 2021 Organizational Partner. Organizational Partners have no role in the magazine’s editorial decisions.