• 2nd November 2021
  • 7 min read

How should we evaluate improvement interventions?

Improvement deserves evaluation – but what’s the best way of finding out what works?

Whenever a new way of diagnosing or treating people in healthcare is developed, rigorous evaluation is a key stage in the process. Everyone accepts that new drugs should be subject to careful assessment of their effectiveness, acceptability and cost. As Karolina Kuberska argued in an earlier blog in this series, evaluation is also critical in determining whether novel approaches to improving really do live up to their promise. Well-meaning improvement efforts may not always work as intended; occasionally they can even do harm. It’s important to consider, for example, whether improvement interventions address the priorities and needs of NHS patients, carers and staff, whether they work better or worse in different circumstances, whether they are a good use of finite healthcare resources. But exactly what does that mean for how we go about evaluating improvement?

One important issue is whether the methods used to evaluate new diagnostic or therapeutic interventions—a new drug, for example, or a new machine—are transferable to the field of improvement.

Evaluating improvement interventions: of snails and evangelists

The approach to evaluating new medicines that is usually seen as best is the randomised controlled trial, or RCT. Trials provide a means of comparing the impact of an intervention with what would have happened in its absence. They are viewed as the gold standard for good reasons: they go a long way to address potential biases in methods, and allow new interventions to be compared with what’s already being done. But they are also complicated, costly and time-consuming.

British LibraryWhether RCTs, and similar experimental approaches, are also the right approach when evaluating improvement efforts is a question that has divided the improvement studies community for some time. Frank Davidoff, and later Robert Burke and Kaveh Shojania, characterise this division as a difference between ‘evangelists’ and ‘snails’ (an odd pairing, first used to describe similarly trenchant views about how best to evaluate screening interventions in the 1970s). Snails would answer ‘yes’: improvement interventions are not so very different from other healthcare interventions, they are competing for resources in the same system, and so we should hold them to the same standards of evidence before widespread adoption. Evangelists would say ‘no’: improvement is different; its potential to cause harm is lower; and improvement interventions are better understood and evaluated through alternative methods.

It’s easy to see merit in both sides of the argument. As the snails would note, the history of healthcare improvement is littered with examples of promising interventions that, when subjected to the most rigorous form of evaluation through RCTs or similar study designs, turn out to have made little difference to the outcomes that matter.

the more complex ... and the more sensitive to context, the trickier it becomes to do an RCT

 

On the other hand, as evangelists would reply, improvement interventions are self-evidently very different beasts from new pharmaceutical treatments, or even many new diagnostic innovations. Improvement interventions are complex interventions, typically involving many components, different people, lots of moving parts. They are context-dependent: they will work better in some places than others. By design, they often involve iteration, with the people involved improving and adapting the intervention as they go along. These features of improvement efforts arguably make them ill-suited to evaluation through RCTs, which tend to work best with static, well defined interventions that are little affected by context (who’s doing them, how and where). The more complex the improvement intervention, and the more sensitive to context, the trickier it becomes to do an RCT—and very quickly an RCT approach can start to become unwieldy.

Beyond RCTs

More than this, though, RCTs don’t tell you everything. A result from an RCT will typically tell you whether, on average across the populations and contexts covered, an intervention is better, worse, or roughly equivalent to an alternative for some key outcomes of interest. What it doesn’t tell you is why and how an intervention worked, or didn’t. That may not matter so much for a medicine, but for improvement interventions—where things like the people involved can make such a crucial difference—it is vital.

process evaluations provide important additional knowledge—and are an important part of any evaluation

 

For complex interventions of this kind, process evaluations provide important additional knowledge—and are an important part of any evaluation, not just trial-based studies. The knowledge generated by process evaluations can be particularly important for improvement work, providing deep insights into mechanisms of action, contextual influences, how the programme played out in practice rather than in design, and how interventions were delivered and experienced by those implementing and receiving them. Process evaluation can also generate important learning about promising interventions that don’t quite make the grade when it comes to demonstrating their effectiveness according to the high statistical standards of proof of an RCT, but which don’t deserve to be discarded. Take, for example, the Patient Reporting and Action for a Safe Environment intervention (see box).

Patient Reporting and Action for a Safe Environment (PRASE), developed by the Bradford Institute for Health Research, provides a means of collecting feedback from patients while in hospital about how safe they feel, with a view to helping staff to identify what is working well and what needs improving. After lengthy and careful development and piloting of the intervention, PRASE was tested using an RCT—and failed to show significant improvements against either of the two key indicators measured.

Meanwhile, a process evaluation undertaken alongside the RCT showed that many of the people using it really valued PRASE. Patients welcomed the opportunity to give feedback, and staff felt it provided them with important insights on safety. But the process evaluation also showed marked differences in the ways that staff engaged with the intervention. Even though there was a standard approach to follow, some staff adopted it more enthusiastically than others, and some were more creative in the way they used the tool to improve communication with patients and to try to initiate change.

How should we interpret the evidence on PRASE? Taking the RCT in isolation and at face value, we might see it as a ‘failed’ intervention: for all its good intentions, it did not ultimately show any improvement on key outcomes compared to usual care. But the process evaluation tells a somewhat different story. Here is an approach that is no panacea, but may nevertheless hold some value for staff and patients—particularly when placed in the hands of a creative, committed team who want to get the most out of it. Certainly this is no mandate for widespread adoption, but neither does it present an unambiguous case for abandonment. Rather, cautious adaptation might be in order—at least for staff who can see in the intervention some promising ideas for how they might better engage with patients and seek to improve safety.

What about locally led improvement initiatives?

Complex, mixed-methods approaches of this kind, combining experimental approaches, process evaluation and health economics, are now common, and are increasingly seen as the best way to evaluate complex interventions before they are rolled out nationally. While this is welcome, exhortations to do evaluation, and do it well, can be daunting. Much improvement work is led locally, by small teams with tight resources and limited access to evaluation expertise. Thorough process evaluation, let alone a fully fledged RCT, may be well beyond the capacity of such teams. But the improvement work they do can make an important contribution to the quality of care provided in the health service—and as such, it deserves evaluation, so that other people in the system can learn from it, replicate it, and adapt it elsewhere.

There is plenty of scope to produce useful knowledge from well executed studies that don’t follow the RCT model, drawing on approaches such as before-and-after studies and interrupted time-series analyses. There’s also scope to adapt the principles of process evaluation to small-scale improvement work. But there is limited agreement on what ‘good enough’ looks like in these situations: for example, what should always be included, and what is nice-to-have but not essential. Clifford Ko’s PhD research, currently in progress at THIS Institute, seeks to address this problem, with a view to developing guidance that can help to guide rigorous, useful evaluation for improvers in all settings.

In seeking to evaluate improvement interventions, then, it is vital not to allow the perfect to be the enemy of the good. Large multi-centre trials, with accompanying process and economic evaluations, have an important place in health services research—and in improvement research, especially when seeking to scale up successful approaches to broader settings. But ensuring that the effort put into improvement at a local level is matched by proportionate efforts to evaluate it is also a crucial way of identifying what works, learning from experience, and ensuring that time invested in improvement bears fruit for patients, carers and staff.

The debates covered in this blog are considered in greater length in a recent book chapter written by Graham with Jane O’Hara and Justin Waring, ‘Large-scale mixed-methods evaluation of safety programmes and interventions, published in Inside hazardous technological systems: methodological foundations, challenges and future directions, published by CRC Press (2021). You can download a PDF of the chapter here.