The thing that gives me the biggest buzz when it comes to improvement is to see a team share their story of what they have learned-by-doing; and what they have delivered that improves their quality of life and the quality of their patients’ experience.

And while the principles that underpin these transformations are generic, each story is unique because no two improvement challenges are exactly the same and no two teams are exactly the same.

The improvement process is not a standardised production line.  It is much more organic and adaptive experience and that requires calm, competent, consistent, compassionate and courageous facilitation.

So when I see a team share their story of what they have done and learned then I know that behind the scenes there will have been someone providing that essential ingredient.

This week a perfect example of a story like this was shared.

It is about the whole team who run the Diabetic Complex Cases Clinic at Guy’s and St. Thomas’ NHS Trust in London.  Everyone involved in the patient care was involved.  It tells the story of how they saw what might be possible and how they stepped up to the challenge of learning to apply the same principles in their world.  And it tells their story of what they diagnosed, what they designed and what they delivered.

The facilitation and support was provided Ellen Pirie who works for the Health Innovation Network (HIN) in South London and who is a Level 2 Health Care Systems Engineer.

And the link to the GSTT Diabetic Complex Clinic Team story is here.

In 1986, Dr Don Berwick from Boston attended a 4-day seminar run by Dr W. Edwards Deming in Washington.  Dr Berwick was a 40 year old paediatrician who was also interested in health care management and improving quality and productivity.  Dr Deming was an 86 year old engineer and statistician who, when he was in his 40’s, helped the US to improve the quality and productivity of the industrial processes supporting the US and Allies in WWII.

Don Berwick describes attending the seminar as an emotionally challenging life-changing experience when he realised that his well-intended attempts to improve quality by inspection-and-correction was a counterproductive, abusive approach that led to fear, demotivation and erosion of pride-in-work.  His blinding new clarity of insight led directly to the Institute of Healthcare Improvement in the USA in the early 1990’s.

One of the tenets of Dr Deming’s theories is that the ingrained beliefs and behaviours that erode pride-in-work also lead to the very outcomes that management do not want – namely conflict between managers and workers and economic failure.

So, an explicit focus on improving pride-in-work as an early objective in any improvement exercise makes very good economic sense, and is a sign of wise leadership and competent management.


Last week a case study was published that illustrates exactly that principle in action.  The important message in the title is “restore the calm”.

One of the most demotivating aspects of health care that many complain about is the stress caused a chaotic environment, chronic crisis and perpetual firefighting.  So, anything that can restore calm will, in principle, improve motivation – and that is good for staff, patients and organisations.

The case study describes, in detail, how calm was restored in a chronically chaotic chemotherapy day unit … on Weds, June 19th 2019 … in one day and at no cost!

To say that the chemotherapy nurses were surprised and delighted is an understatement.  They were amazed to see that they could treat the same number of patients, with the same number of staff, in the same space and without the stress and chaos.  And they had time to keep up with the paperwork; and they had time for lunch; and they finished work 2 hours earlier than previously!

Such a thing was not possible surely? But here they were experiencing it.  And their patients noticed the flip from chaos-to-strangely-calm too.

The impact of the one-day-test was so profound that the nurses voted to adopt the design change the following week.  And they did.  And the restored calm has been sustained.


What happened next?

The chemotherapy nurses were able to catch up with their time-owing that had accumulated from the historical late finishes.  And the problem of high staff turnover and difficultly in recruitment evaporated.  Highly-trained chemotherapy nurses who had left because of the stressful chaos now want to come back.  Pride-in-work has been re-established.  There are no losers.  It is a win-win-win result for staff, patients and organisations.


So, how was this “miracle” achieved?

Well, first of all it was not a miracle.  The flip from chaos-to-calm was predicted to happen.  In fact, that was the primary objective of the design change.

So, how what this design change achieved?

By establishing the diagnosis first – the primary cause of the chaos – and it was not what the team believed it was.  And that is the reason they did not believe the design change would work; and that is the reason they were so surprised when it did.

So, how was the diagnosis achieved?

By using an advanced systems engineering technique called Complex Physical System (CPS) modelling.  That was the game changer!  All the basic quality improvement techniques had been tried and had not worked – process mapping, direct observation, control charts, respectful conversations, brainstorming, and so on.  The system structure was too complicated. The system behaviour was too complex (i.e. chaotic).

What CPS revealed was that the primary cause of the chaotic behaviour was the work scheduling policy.  And with that clarity of focus, the team were able to re-design the policy themselves using a simple paper-and-pen technique.  That is why it cost nothing to change.

So, why hadn’t they been able to do this before?

Because systems engineering is not a taught component of the traditional quality improvement offerings.  Healthcare is rather different to manufacturing! As the complexity of the health care system increases we need to learn the more advanced tools that are designed for this purpose.

What is the same is the principle of restoring pride-in-work and that is what Dr Berwick learned from Dr Deming in 1986, and what we saw happen on June 19th, 2019.

To read the story of how it was done click here.

Innovation means anything new and new ideas spread through groups of people in a characteristic way that was described by Everett Rogers in the 1970’s.

The evidence showed that innovation started with the small minority of innovators (about 2%)  and  diffuses through the population – first to the bigger minority called early adopters.

Later, it became apparent that the diffusion path was not smooth and that there was a chasm into which many promising innovations fell and from which they did not emerge.

If this change chasm can be bridged then a tipping point is achieved when wider adoption by the majority becomes much more likely.

And for innovations that fundamentally change the way we live and work, this whole process can take decades! Generations even.

Take mobile phones and the Internet as good examples. How many can remember life before those innovations?  And we are living the transition to renewable energy, artificial intelligence and electric cars.


So, it is very rewarding to see growing evidence that the innovators who started the health care improvement movement back in the 1990’s, such as Dr Don Berwick in the USA and Dr Kate Silvester in the UK, have grown a generation of early adopters who now appear to have crossed the chasm.

The evidence for that can be found on the NHS Improvement website – for example the QSIR site (Quality, Service Improvement and Redesign).

Browsing through the QSIR catalogue of improvement tools I recognised them all from previous incarnations developed and tested by the NHS Modernisation Agency and NHS Institute for Innovation and Improvement.  And although those organisations no longer exist, they served as incubators for the growing community of healthcare improvement practitioners (CHIPs) and their legacy lives on.

This is all good news because we now also have a new NHS Long Term Plan which sets out an ambitious vision for the next 10 years and it is going to need a lot of work from the majority of people who work in the NHS to deliver. That will need capability-at-pace-and-scale.

And this raises some questions:

Q1: Will the legacy of the MA and NHSi scale to meet the more challenging task of designing and delivering the vision of a system of Integrated Care Systems (ICS) that include primary care, secondary care, community care, mental health and social care?

Q2: Will some more innovation be required?

If history is anything to go by, then I suspect the the answers will be “Q1: No” and “Q2: Yes”.

Bring it on!

It is 50 years today that Apollo 11 landed men on the moon – and the final small step for one man, astronaut Neil Armstrong, was indeed a giant leap for mankind.

Achieving that goal was the result of a massive programme of inspiration, innovation, investment, exploration and emergent learning that has led directly to many of the everyday things that we take for granted.  Portable computers and the Internet are just two spin-offs that our 21st Century society could not function without.


I have also just finished reading “Into the Black” which is the gripping story of the first flight of the Space Shuttle in April 1981.

This was another giant technical and cultural leap that paved the way for the International Space Station.

 

And it has not been plain sailing.  There have been very visible disasters that have shocked the world and challenged our complacency.

This is how complex system design is – few notice when it works – and everyone notices when it fails.

The emerging body of knowledge that NASA used is called systems engineering and it can be applied to any system, of any size and any complexity.

And that includes health care.

So, today is a time to pause, reflect and celebrate these awe inspiring achievements.  And to draw hope from them because the challenges that health care faces today require no less a commitment to investment in learning how to improve-by-design.

That is health care systems engineering.

This is the name given to an endemic, chronic, systemic, design disease that afflicts the whole NHS that very few have heard of, and even fewer understand.

This week marked two milestones in the public exposure of this elusive but eminently treatable health care system design illness that causes queues, delays, overwork, chaos, stress and risk for staff and patients alike.

The first was breaking news from the team in Swansea led by Chris Jones.

They had been grappling with the wicked problem of chronic queues, delays, chaos, stress, high staff turnover, and escalating costs in their Chemotherapy Day Unit (CDU) at the Singleton Hospital.

The breakthrough came earlier in the year when we used the innovative eleGANTT® system to measure and visualise the CDU chaos in real-time.

This rich set of data enabled us, for the first time, to apply a powerful systems engineering  technique called counterfactual analysis which revealed the primary cause of the chaos – the elusive and counter-intuitive design disease carvoutosis multiforme fulminans.

And this diagnosis implied that the chaos could be calmed quickly and at no cost.

But that news fell on deaf ears because, not surprisingly, the CDU team were highly sceptical that such a thing was possible.

So, to convince them we needed to demonstrate the adverse effect of carveoutosis in a way that was easy to see.  And to do that we used some advanced technology: dice and tiddly winks.

The reaction of the CDU nurses was amazing.  As soon as they ‘saw’ it they ‘clicked’ and immediately grasped how to apply it in their world.  They designed the change they needed to make in a matter of minutes.


But the proof-of-the-pudding-is-in-the eating and we arranged a one-day-test-of-change of their anti-carveout design.

The appointed day arrived, Wednesday 17th June.  The CDU nurses implemented their new design, which cost nothing to change.  Within an hour of the day starting they reported that the CDU was strangely calm.   And at the end of the day they reported that it had remained strangely calm all day; and that they had time for lunch; and that they had time to do all their admin as they went; and that they finished on time; and that the patients did not wait for their chemotherapy; and that the patients noticed the chaos-to-calm transformation too.

They treated just the same number of patients as usual with the same staff, in the same space and with the same equipment.  It cost nothing to make the change.

To say they they were surprised is an understatement.  They were so surprised and so delighted that they did not want to go back to the old design – but they had to because it was only a one-day-test-of-change.

So, on Thursday and Friday they reverted back to the carveoutosis design.  And the chaos returned.  That nailed it!  There was a riot!!  The CDU nurses refused to wait until later in the year to implement the new design and they voted unanimously to implement it from the following Monday.  And they did.  And calm was restored.


The second milestone happened on Thursday 11th July when we ran a Health Care Systems Engineering (HCSE) Masterclass on the very same topic … chronic systemic carveoutosis multiforme fulminans.

This time we used the dice and tiddly winks to demonstrate the symptoms, signs and the impact of treatment.  Then we explored the known pathophysiology of this elusive and endemic design disease in much more depth.

This is health care systems engineering in action.

It works.

One of the most surprising aspects of systems is how some big changes have no observable effect and how some small changes are game-changers. Why is that?

The technical name for this phenomenon is leverage points.

When a nudge is made at a leverage point in a real system the impact is amplified – so a small cause can have a big effect.

And when a big kick is made where there is no leverage point the effort is dissipated. Like flogging a dead horse.

Other names for leverage points are triggers, buttons, catalysts, fuses etc.


The fact that there is a big effect does not imply it is a good effect.

Poking a leverage point can trigger a catastrophe just as it can trigger a celebration. It depends on how it is poked.

Perhaps that is one reason people stay away from them.

But when our heath care system performance is in decline, if we do nothing or if we act but stay away from leverage points (i.e. flog the dead horse) then we will deny ourselves the opportunity of improvement.

So, we need a way to (a) identify the leverage points and (b) know how to poke them positively and know how to not poke them into delivering a catastrophe.


Here is a couple of real examples.


The time-series chart above shows the A&E performance of a real acute trust.  Notice the pattern as we read left-to-right; baseline performance is OKish and dips in the winters, and the winter dips get deeper but the baseline performance recovers.  In April 2015 (yellow flag) the system behaviour changes, and it goes into a steady decline with added winter dips.  This is the characteristic pattern of poking a leverage point in the wrong way … and the fact it happened at the start of the financial year suggests that Finance was involved.  Possibly triggered by a cost-improvement programme (CIP) action somewhere else in the system.  Save a bit of money here and create a bigger problem over there. That is how systems work. Not my budget so not my problem.

Here is a different example, again from a real hospital and around the same time.  It starts with a similar pattern of deteriorating performance and there is a clear change in system behaviour in Jan 2015.  But in this case the performance improves and stays improved.  Again, the visible sign of a leverage point being poked but this time in a good way.

In this case I do know what happened.  A contributory cause of the deteriorating performance was correctly diagnosed, the leverage point was identified, a change was designed and piloted, and then implemented and validated.  And it worked as predicted.  It was not a fluke.  It was engineered.


So what is the reason that the first example much more commonly seen than the second?

That is a very good question … and to answer it we need to explore the decision making process that leads up to these actions because I refuse to believe that anyone intentionally makes decisions that lead to actions that lead to deterioration in health care performance.

And perhaps we can all learn how to poke leverage points in a positive way?

This recent tweet represents a significant milestone.  It formally recognises and celebrates in public the impact that developing health care systems engineering (HCSE) capability has had on the culture of the organisation.

What is also important is that the HCSE training was not sought and funded by the Trust, it was discovered by chance and funded by their commissioners, the local clinical commissioning group (CCG).


The story starts back in the autumn of 2017 and, by chance, I was chatting with Rob, a friend-of-a-friend, about work. As you do. It turned out that Rob was the CCG Lead for Unscheduled Care and I was describing how HCSE can be applied in any part of any health care system; primary care, secondary care, scheduled, unscheduled, clinical, operational or whatever.  They are all parts of the same system and the techniques and tools of improvement-by-design are generic.  And I described lots of real examples of doing just that and the sustained improvements that had followed.

So he asked “If you were to apply this approach to unscheduled care in a large acute trust how would you do it?“.  My immediate reply was “I would start by training the front line teams in the HCSE Level 1 stuff, and the first step is to raise awareness of what is possible.  We do that by demonstrating it in practice because you have to see it and experience it to believe it.

And so that is what we did.

The CCG commissioned a one-year HCSE Level 1 programme for four teams at University Hospitals of North Midlands (UHNM) and we started in January 2018 with some One Day Flow Workshops.

The intended emotional effect of a Flow Workshop is surprise and delight.  The challenge for the day is to start with a simulated, but very realistic, one-stop outpatient clinic which is chaotic and stressful for everyone.  And with no prior training the delegates transform it into a calm and enjoyable experience using the HCSE approach.  It is called emergent learning.  We have run dozens of these workshops and it has never failed.

After directly experiencing HCSE working in practice the teams that stepped up to the challenge were from ED, Transformation, Ambulatory Emergency Care and Outpatients.


The key to growing HCSE capability is to assemble small teams, called micro-system design teams (MSDTs) and to focus on causes that fall inside their circle of control.

The MSDT sessions need to be regular, short, and facilitated by an experienced HCSE who has seen it, done it and can teach it.

In UHNM, the Transformation team divided themselves between the front-line teams and they learned HCSE together.  Here’s a picture of the ED team … left to right we have Alex, Mark and Julie (ED consultants) then Steve and Janina (Transformation).  The essential tools are a big table, paper, pens, notebooks, coffee and a laptop/projector.

The purpose of each session is empirical learning-by-doing i.e. using a real improvement challenge to learn and practice the method so that before the end of the programme the team can confidently “fly” solo.

That is the key to continued growth and sustained improvement.  The HCSE capability needs to become embedded.

It is good fun and immensely rewarding to see the “ah ha” moments and improvements happen as the needle on the emotometer moves from “Can’t Do” to “Can Do”.

Metamorphosis is re-arranging what you already have in a way that works better.


The tweet is objective evidence that demonstrates the HCSE programme delivers as designed.  It is fit-for-purpose.  It is called validation.

The other objective evidence of effectiveness comes from the learning-by-doing projects themselves.  And for an individual to gain a coveted HCSE Level 1 Certificate of Competency requires writing up to a publishable quality and sharing the story. Warts-and-all.

To read the full story of just click here

And what started this was the CCG who had the strategic vision, looked outside themselves for innovative approaches, and demonstrated the courage to take a risk.

Commissioned Improvement.

One of the big hurdles in health care improvement is that most of the low hanging fruit have been harvested.

These are the small improvement projects that can be done quickly because as soon as the issue is made visible to the stakeholders the cause is obvious and the solution is too.

This is where kaizen works well.

The problem is that many health care issues are rather more difficult because the process that needs improving is complicated (i.e. it has lots of interacting parts) and usually exhibits rather complex behaviour (e.g. chaotic).

One good example of this is a one stop multidisciplinary clinic.

These are widely used in healthcare and for good reason.  It is better for a patient with a complex illness, such as diabetes, to be able to access whatever specialist assessment and advice they need when they need it … i.e. in an outpatient clinic.

The multi-disciplinary team (MDT) is more effective and efficient when it can problem-solve collaboratively.

The problem is that the scheduling design of a one stop clinic is rather trickier than a traditional simple-but-slow-and-sequential new-review-refer design.

A one stop clinic that has not been well-designed feels chaotic and stressful for both staff and patients and usually exhibits the paradoxical behaviour of waiting patients and waiting staff.


So what do we need to do?

We need to map and measure the process and diagnose the root cause of the chaos, and then treat it.  A quick kaizen exercise should do the trick. Yes?

But how do we map and measure the chaotic behaviour of lots of specialists buzzing around like blue-***** flies trying to fix the emergent clinical and operational problems on the hoof?  This is not the linear, deterministic, predictable, standardised machine-dominated production line environment where kaizen evolved.

One approach might be to get the staff to audit what they are doing as they do it. But that adds extra work, usually makes the chaos worse, fuels frustration and results in a very patchy set of data.

Another approach is to employ a small army of observers who record what happens, as it happens.  This is possible and it works, but to be able to do this well requires a lot of experience of the process being observed.  And even if that is achieved the next barrier is the onerous task of transcribing and analysing the ocean of harvested data.  And then the challenge of feeding back the results much later … i.e. when the sands have shifted.


So we need a different approach … one that is able to capture the fine detail of a complex process in real-time, with minimal impact on the process itself, and that can process and present the wealth of data in a visual easy-to-assess format, and in real-time too.

This is a really tough design challenge …
… and it has just been solved.

Here are two recent case studies that describe how it was done using a robust systems engineering method.

Abstract

Abstract

On Thursday we had a very enjoyable and educational day.  I say “we” because there were eleven of us learning together.

There was Declan, Chris, Lesley, Imran, Phil, Pete, Mike, Kate, Samar and Ellen and me (behind the camera).  Some are holding their long-overdue HCSE Level-1 Certificates and Badges that were awarded just before the photo was taken.

The theme for the day was System Dynamics which is a tried-and-tested approach for developing a deep understanding of how a complex adaptive system (CAS) actually works.  A health care system is a complex adaptive system.

The originator of system dynamics is Jay Wright Forrester who developed it around the end of WW2 (i.e. about 80 years ago) and who later moved to MIT.  Peter Senge, author of The Fifth Discipline was part of the same group as was Donella Meadows who wrote Limits to Growth.  Their dream was much bigger – global health – i.e. the whole planet not just the human passengers!  It is still a hot topic [pun intended].


The purpose of the day was to introduce the team of apprentice health care system engineers (HCSEs) to the principles of system dynamics and to some of its amazing visualisation and prediction techniques and tools.

The tangible output we wanted was an Excel-based simulation model that we could use to solve a notoriously persistent health care service management problem …

How to plan the number of new and review appointment slots needed to deliver a safe, efficient, effective and affordable chronic disease service?

So, with our purpose in mind, the problem clearly stated, and a blank design canvas we got stuck in; and we used the HCSE improvement-by-design framework that everyone was already familiar with.

We made lots of progress, learned lots of cool stuff, and had lots of fun.

We didn’t quite get to the final product but that was OK because it was a very tough design assignment.  We got 80% of the way there though which is pretty good in one day from a standing start.  The last 20% can now be done by the HCSEs themselves.

We were all exhausted at the end.  We had worked hard.  It was a good day.


And I am already looking forward to the next HCSE Masterclass that will be in about six weeks time.  This one will address another chronic, endemic, systemic health care system “disease” called carveoutosis multiforme fulminans.

This week saw the publication of a landmark paper – one that will bring hope to many.  A paper that describes the first step of a path forward out of the mess that healthcare seems to be in.  A rational, sensible, practical, learnable and enjoyable path.


This week I also came across an idea that triggered an “ah ha” for me.  The idea is that the most rapid learning happens when we are making mistakes about half of the time.

And when I say ‘making a mistake’ I mean not achieving what we predicted we would achieve because that implies that our understanding of the world is incomplete.  In other words, when the world does not behave as we expect, we have an opportunity to learn and to improve our ability to make more reliable predictions.

And that ability is called wisdom.


When we get what we expect about half the time, and do not get what we expect about the other half of the time, then we have the maximum amount of information that we can use to compare and find the differences.

Was it what we did? Was it what we did not do? What are the acts and errors of commission and omission? What can we learn from those? What might we do differently next time? What would we expect to happen if we do?


And to explore this terrain we need to see the world as it is … warts and all … and that is the subject of the landmark paper that was published this week.


The context of the paper is improvement of cancer service delivery, and specifically of reducing waiting time from referral to first appointment.  This waiting is a time of extreme anxiety for patients who have suspected cancer.

It is important to remember that most people with suspected cancer do not have it, so most of the work of an urgent suspected cancer (USC) clinic is to reassure and to relieve the fear that the spectre of cancer creates.

So, the sooner that reassurance can happen the better, and for the unlucky minority who are diagnosed with cancer, the sooner they can move on to treatment the better.

The more important paragraph in the abstract is the second one … which states that seeing the system behaviour as it is, warts-and-all,  in near-real-time, allows us to learn to make better decisions of what to do to achieve our intended outcomes. Wiser decisions.

And the reason this is the more important paragraph is because if we can do that for an urgent suspected cancer pathway then we can do that for any pathway.


The paper re-tells the first chapter of an emerging story of hope.  A story of how an innovative and forward-thinking organisation is investing in building embedded capability in health care systems engineering (HCSE), and is now delivering a growing dividend.  Much bigger than the investment on every dimension … better safety, faster delivery, higher quality and more affordability. Win-win-win-win.

The only losers are the “warts” – the naysayers and the cynics who claim it is impossible, or too “wicked”, or too difficult, or too expensive.

Innovative reality trumps cynical rhetoric … and the full abstract and paper can be accessed here.

So, well done to Chris Jones and the whole team in ABMU.

And thank you for keeping the candle of hope alight in these dark, stormy and uncertain times for the NHS.

This week, it was my great pleasure to award the first Health Care Systems Engineering (HCSE) Level 2 Medal to Dr Kate Silvester, MBA, FRCOphth.

Kate is internationally recognised as an expert in health care improvement and over more than two decades has championed the adoption of improvement methods such as Lean and Quality Improvement in her national roles in the Modernisation Agency and then the NHS Institute for Innovation and Improvement.

Kate originally trained as a doctor and then left the NHS to learn manufacturing systems engineering with Lucas and Airbus.  Kate then brought these very valuable skills back with her into the NHS when she joined the Cancer Services Collaborative.

Kate is co-founder of the Journal of Improvement Science and over the last five years has been highly influential in the development of the Health Care Systems Engineering Programme – the first of its kind in the world that is designed by clinicians for clinicians.

The HCSE Programme is built on the pragmatic See One-Do Some-Teach Many principle of developing competence and confidence through being trained and coached by a more experienced practitioner while doing projects of increasing complexity and training and coaching others who are less experienced.

Competence is based on evidence-of-effectiveness, and Kate has achieved HCSE Level 2 by demonstrating that she can do HCSE and that she can teach and coach others how to do HCSE as well.

To illustrate, here is a recent FHJ paper that Kate has authored which illustrates the HCSE principles applied in practice in a real hospital.  This work was done as part of the Health Foundation’s Flow, Cost and Quality project that Kate led and recent evidence proves that the improvements have sustained and spread.  South Warwickshire NHS Foundation Trust is now one of the top-performing Trusts in the NHS.

More recently, Kate has trained and coached new practitioners in Exeter and North Devon who have delivered improvements and earned their HCSE 1 wings.

Congratulations Kate!

One of the most frequent niggles that I hear from patients is the difficultly they have getting an appointment with their general practitioner.  I too have personal experience of the distress caused by the ubiquitous “Phone at 8AM for an Appointment” policy, so in June 2018 when I was approached to help a group of local practices redesign their appointment booking system I said “Yes, please!


What has emerged is a fascinating, enjoyable and rewarding journey of co-evolution of learning and co-production of an improved design.  The multi-skilled design team (MDT) we pulled together included general practitioners, receptionists and practice managers and my job was to show them how to use the health care systems engineering (HCSE) framework to diagnose, design, decide and deliver what they wanted: A safe, calm, efficient, high quality, value-4-money appointment booking service for their combined list of 50,000 patients.


This week they reached the start of the ‘decide and deliver‘ phase.  We have established the diagnosis of why the current booking system is not delivering what we all want (i.e. patients and practices), and we have assembled and verified the essential elements of an improved design.

And the most important outcome for me is that the Primary Care MDT now feel confident and capable to decide what and how to deliver it themselves.   That is what I call embedded capability and achieving it is always an emotional roller coaster ride that we call The Nerve Curve.

What we are dealing with here is called a complex adaptive system (CAS) which has two main components: Processes and People.  Both are complicated and behave in complex ways.  Both will adapt and co-evolve over time.  The processes are the result of the policies that the people produce.  The policies are the result of the experiences that the people have and the explanations that they create to make intuitive sense of them.

But, complex systems often behave in counter-intuitive ways, so our intuition can actually lead us to make unwise decisions that unintentionally perpetuate the problem we are trying to solve.  The name given to this is a wicked problem.

A health care systems engineer needs to be able to demonstrate where these hidden intuitive traps lurk, and to explain what causes them and how to avoid them.  That is the reason the diagnosis and design phase is always a bit of a bumpy ride – emotionally – our Inner Chimp does not like to be challenged!  We all resist change.  Fear of the unknown is hard-wired into us by millions of years of evolution.

But we know when we are making progress because the “ah ha” moments signal a slight shift of perception and a sudden new clarity of insight.  The cognitive fog clears a bit and a some more of the unfamiliar terrain ahead comes into view.  We are learning.

The Primary Care MDT have experienced many of these penny-drop moments over the last six months and unfortunately there is not space here to describe them all, but I can share one pivotal example.


A common symptom of a poorly designed process is a chronically chaotic queue.

[NB. In medicine the term chronic means “long standing”.  The opposite term is acute which means “recent onset”].

Many assume, intuitively, that the cause of a chronically chaotic queue is lack of capacity; hence the incessant calls for ‘more capacity’.  And it appears that we have learned this reflex response by observing the effect of adding capacity – which is that the queue and chaos abate (for a while).  So that proves that lack of capacity was the cause. Yes?

Well actually it doesn’t.  Proving causality requires a bit more work.  And to illustrate this “temporal association does not prove causality trap” I invite you to consider this scenario.

I have a headache => I take a paracetamol => my headache goes away => so the cause of my headache was lack of paracetamol. Yes?

Errr .. No!

There are many contributory causes of chronically chaotic queues and lack of capacity is not one of them because the queue is chronic.  What actually happens is that something else triggers the onset of chaos which then consumes the very resource we require to avoid the chaos.  And once we slip into this trap we cannot escape!  The chaos-perpretuating behaviour we observe is called fire-fighting and the necessary resource it consumes is called resilience.


Six months ago, the Primary Care MDT believed that the cause of their chronic appointment booking chaos was a mismatch between demand and capacity – i.e. too much patient demand for the appointment capacity available.  So, there was a very reasonable resistance to the idea of making the appointment booking process easier for patients – they justifiably feared being overwhelmed by a tsunami of unmet need!

Six months on, the Primary Care MDT understand what actually causes chronic queues and that awareness has been achieved by a step-by-step process of explanation and experimentation in the relative safety of the weekly design sessions.

We played simulation games – lots of them.

One particularly memorable “Ah Ha!” moment happened when we played the Carveout Game which is done using dice, tiddly-winks, paper and coloured-pens.  No computers.  No statistics.  No queue theory gobbledygook.  No smoke-and-mirrors.  No magic.

What the Carveout Game demonstrates, practically and visually, is that an easy way to trigger the transition from calm-efficiency to chaotic-ineffectiveness is … to impose a carveout policy on a system that has been designed to achieve optimum efficiency by using averages.  Boom!  We slip on the twin banana skins of the Flaw-of-Averages and Sub-Optimisation, slide off the performance cliff, and career down the rocky slope of Chronic Chaos into the Depths of Despair – from which we cannot then escape.

This visual demonstration was a cognitive turning point for the MDT.  They now believed that there is a rational science to improvement and from there we were on the step-by-step climb to building the necessary embedded capability.


It now felt like the team were pulling what they needed to know.  I was no longer pushing.  We had flipped from push-to-pull.  That is called the tipping point.

And that is how health care systems engineering (HCSE) works.


Health care is a complex adaptive system, and what a health care systems engineer actually “designs” is a context-sensitive  incubator that nurtures the seeds of innovation that already exist in the system and encourages them to germinate, grow and become strong enough to establish themselves.

That is called “embedded improvement-by-design capability“.

And each incubator need to be different – because each system is different.  One-solution-fits-all-problems does not work here just as it does not in medicine.  Each patient is both similar and unique.


Just as in medicine, first we need to diagnose the actual cause;  second we need to design some effective solutions; third we need to decide which design to implement and fourth we need to deliver it.

But the how-to-do-it feels a bit counter-intuitive, and if it were not we would already be doing it. But the good news is that anyone can learn how to do HCSE.

As we approach the end of 2018 it is a good time to look back and reflect on what has happened this year.

It has been my delight to have had the opportunity to work with front-line teams at University Hospital of North Midlands (UHNM) and to introduce them to the opportunity that health care systems engineering (HCSE) offers.

This was all part of a coordinated, cooperative strategy commissioned by the Staffordshire Clinical Commissioning Groups, and one area we were asked to look at was unscheduled care.

It was not my brief to fix problems.  I was commissioned to demonstrate how a systems engineer might approach them.  The first step was to raise awareness, then develop some belief and then grow some embedded capability – in the system itself.

The rest was up to the teams who stepped up to the challenge.  So what happened?

Winter is always a tough time for the NHS and especially for unscheduled care so let us have a look  and compare UHNM with NHS England as a whole – using the 4 hour A&E target yield – and over a longer time period of 7 years (so that we can see some annual cycles and longer term trends).

The A&E performance for the NHS in England as whole has been deteriorating at an accelerating pace over the 7 years.  This is a system-wide effect and there are a multitude of plausible causes.

The current UHNM system came into being at the end of 2014 with the merger of the Stafford and Stoke Hospital Trusts – and although their combined A&E performance dropped below average for England – the chart above shows that it did not continue to slide.

The NHS across the UK had a very bad time in the winter of 2017/18 – with a double whammy of sequential waves of Flu B and Flu A not helping!

But look at what happened at UHNM since Feb 2018.  Something has changed for the better and this is a macro system effect.  There has been a positive deviation from the expectation with about a 15% improvement in A&E 4-hr yield.  That is outstanding!

Now, I would say that news is worth celebrating and shouting “Well done everyone!” and then asking “How was that achieved?” and “What can we all learn that we can take forward into 2019 and build on?

Merry Christmas.

It is November 2018, the clocks have changed back to GMT, the trick-and-treats are done, the fireworks light the night skies and spook the hounds, and the seasonal aisles in the dwindling number of high street stores are already stocked for Christmas.

I have been a bit quiet on the blog front this year but that is because there has been a lot happening behind the scenes and I have had to focus.

One output of is the recent publication of an article in Future Healthcare Journal on the topic of health care systems engineering (HCSE).  Click here to read the article and the rest of this excellent edition of FHJ that is dedicated to “systems”.

So, as we are back to the winter phase of the annual NHS performance cycle it is a good time to glance at the A&E Performance Radar and see who is doing well, and not-so-well.

Based on past experience, I was expecting Luton to be Top-of-the-Pops and so I was surprised (and delighted) to see that Barnsley have taken the lead.  And the chart shows that Barnsley has turned around a reasonable but sagging performance this year.

So I would be asking “What has happened at Barnsley that we can all learn from? What did you change and how did you know what and how to do that?

To be sure, Luton is still in the top three and it is interesting to explore who else is up there and what their A&E performance charts look like.

The data is all available for anyone with a web-browser to view – here.

For completeness, this is the chart for Luton, and we can see that, although the last point is lower than Barnsley, the performance-over-time is more consistent and less variable. So who is better?

NB. This is a meaningless question and illustrates the unhelpful tactic of two-point comparisons with others, and with oneself. The better question is “Is my design fit-for-purpose?”

The question I have for Luton is different. “How do you achieve this low variation and how do you maintain it? What can we all learn from you?”

And I have some ideas how they do that because in a recent HSJ interview they said “It is all about the filters“.


What do they mean by filters?

A filter is an essential component of any flow design if we want to deliver high safety, high efficiency, high effectiveness, and high productivity.  In other words, a high quality, fit-4-purpose design.

And the most important flow filters are the “upstream” ones.

The design of our upstream flow filters is critical to how the rest of the system works.  Get it wrong and we can get a spiralling decline in system performance because we can unintentionally trigger a positive feedback loop.

Queues cause delays and chaos that consume our limited resources.  So, when we are chasing cost improvement programme (CIP) targets using the “salami slicer” approach, and combine that with poor filter design … we can unintentionally trigger the perfect storm and push ourselves over the catastrophe cliff into perpetual, dangerous and expensive chaos.

If we look at the other end of the NHS A&E league table we can see typical examples that illustrate this pattern.  I have used this one only because it happens to be bottom this month.  It is not unique.

All other NHS trusts fall somewhere between these two extremes … stable, calm and acceptable and unstable, chaotic and unacceptable.

Most display the stable and chaotic combination – the “Zone of Perpetual Performance Pain”.

So what is the fundamental difference between the outliers that we can all learn from? The positive deviants like Barnsley and Luton, and the negative deviants like Blackpool.  I ask this because comparing the extremes is more useful than laboriously exploring the messy, mass-mediocrity in the middle.

An effective upstream flow filter design is a necessary component, but it is not sufficient. Triage (= French for sorting) is OK but it is not enough.  The other necessary component is called “downstream pull” and omitting that element of the design appears to be the primary cause of the chronic chaos that drags trusts and their staff down.

It is not just an error of omission though, the current design is an actually an error of commission. It is anti-pull; otherwise known as “push”.


This year I have been busy on two complicated HCSE projects … one in secondary care and the other in primary care.  In both cases the root cause of the chronic chaos is the same.  They are different systems but have the same diagnosis.  What we have revealed together is a “push-carveout” design which is the exact opposite of the “upstream-filter-plus-downstream-pull” design we need.

And if an engineer wanted to design a system to be chronically chaotic then it is very easy to do. Here is the recipe:

a) Set high average utilisation target of all resources as a proxy for efficiency to ensure everything is heavily loaded. Something between 80% and 100% usually does the trick.

b) Set a one-size-fits-all delivery performance target that is not currently being achieved and enforce it punitively.  Something like “>95% of patients seen and discharged or admitted in less than 4 hours, or else …”.

c) Divvy up the available resources (skills, time, space, cash, etc) into ring-fenced pots.

Chronic chaos is guaranteed.  The Laws of Physics decree it.


Unfortunately, the explanation of why this is the case is counter-intuitive, so it is actually better to experience it first, and then seek the explanation.  Reality first, reasoning second.

And, it is a bittersweet experience, so it needs to be done with care and compassion.

And that’s what I’ve been busy doing this year. Creating the experiences and then providing the explanations.  And if done gradually what then happens is remarkable and rewarding.

The FHJ article outlines one validated path to developing individual and organisational capability in health care systems engineering.

It is always a huge compliment to see an idea improved and implemented by inspired innovators.

Health care systems engineering (HCSE) brings together concepts from the separate domains of systems engineering and health care.  And one idea that emerged from this union is to regard the health care system as a living, evolving, adapting entity.

In medicine we have the concept of ‘vital signs’ … a small number of objective metrics that we can measure easily and quickly.  With these we can quickly assess the physical health of a patient and decide if we need to act, and when.

With a series of such measurements over time we can see the state of a patient changing … for better or worse … and we can use this to monitor the effect of our actions and to maintain the improvements we achieve.

For a patient, the five vital signs are conscious level, respiratory rate, pulse, blood pressure and temperature. To sustain life we must maintain many flows within healthy ranges and the most critically important is the flow of oxygen to every cell in the body.  Oxygen is carried by blood, so blood flow is critical.

So, what are the vital signs for a health care system where the flows are not oxygen and blood?  They are patients, staff, consumables, equipment, estate, data and cash.

The photograph shows a demonstration of a Vitals Dashboard for a part of the cancer care system in the ABMU health board in South Wales.  The inspirational innovators who created it are Imran Rao (left), Andy Jones (right) and Chris Jones (top left), and they are being supported by ABMU to do this as part of their HCSE training programme.

So well done guys … we cannot wait to hear how being better able to seeing the voice of your cancer system translates into improved care for patients, and improved working life for the dedicated NHS staff, and improved use of finite public resources.  Win-win-win.

In medicine we use checklists as aide memoirs because they help us to avoid errors of omission, especially in an emergency when we are stressed and less able to think logically.

One that everyone learns if they do a First Aid course is A.B.C. and it stands for Airway, Breathing, Circulation.  It is designed to remind us what to do first because everything that follows depends on it, and then what to do next, and so on.  Avoiding the errors of omission improves outcomes.


In the world of improvement we are interested in change-for-the-better and there are many models of change that we can use to remind us not to omit necessary steps.

One of these is called the Six Steps model (or trans-theoretical model to use the academic title) and it is usually presented as a cycle starting with a state called pre-contemplation.

This change model arose from an empirical study of people who displayed addictive behaviours (e.g. smoking, drinking, drugs etc) and specifically, those who had overcome them without any professional assistance.

The researchers compared the stories from the successful self-healers with the accepted dogma for the management of addictions, and they found something very interesting.  The dogma advocated action, but the stories showed that there were some essential steps before action; steps that should not be omitted.  Specifically, the contemplation and determination steps.

If corrective actions were started too early then the success rate was low.  When the pre-action steps were added the success rate went up … a lot!


The first step is to raise awareness which facilitates a shift from pre-contemplation to contemplation.  The second step is to provide information that gradually increases the pros for change and at the same time gradually decreases the cons for change.

If those phases are managed skillfully then a tipping point is reached where the individual decides to make the change and moves themselves to the third step, the determination or planning phase.

Patience and persistence is required.  The contemplation phase can last a long time.  It is the phase of exploration, evidence and explanation. It is preparing the ground for change and can be summed up in one word: Study.

Often the trigger for determination (i.e. Plan) and then action (i.e. Do) is relatively small because when we are close to the tipping point it does not take much to nudge us to step across the line.


And there is an aide memoir we can use for this change cycle … one that is a bit easier to remember:

A = Awareness
B = Belief
C = Capability
D = Delivery
E = Excellence (+enjoyment, +evidence, +excitement, +engagement)

First we raise awareness of the issue.
Then we learn a solution is possible and that we can learn the know-how.
Then we plan the work.
Then we work the plan.
Then we celebrate what worked and learn from what did and what did not.

Experience shows that the process is not discrete and sequential and it cannot be project managed into defined time boxes.  Instead, it is a continuum and the phases overlap and blend from one to the next in a more fluid and adaptive way.


Raising awareness requires both empathy and courage because this issue is often treated as undiscussable, and even the idea of discussing it is undiscussable too. Taboo.

But for effective change we need to grasp the nettle, explore the current reality, and start the conversation.

The debate about how to sensibly report NHS metrics has been raging for decades.

So I am delighted to share the news that NHS Improvement have finally come out and openly challenged the dogma that two-point comparisons and red-amber-green (RAG) charts are valid methods for presenting NHS performance data.

Their rather good 147-page guide can be downloaded: HERE


The subject is something called a statistical process control (SPC) chart which sounds a bit scary!  The principle is actually quite simple:

Plot data that emerges over time as a picture that tells a story – #plotthedots

The  main trust of the guide is learning the ropes of how to interpret these pictures in a meaningful way and to avoid two traps (i.e. errors).

Trap #1 = Over-reacting to random variation.
Trap #2 = Under-reacting to non-random variation.

Both of these errors cause problems, but in different ways.


Over-reacting to random variation

Random variation is a fact of life.  No two days in any part of the NHS are the same.  Some days are busier/quieter than others.

Plotting the daily-arrivals-in-A&E dots for a trust somewhere in England gives us this picture.  (The blue line is the average and the purple histogram shows the distribution of the points around this average.)

Suppose we were to pick any two days at random and compare the number of arrivals on those two days? We could get an answer anywhere between an increase of 80% (250 to 450) or a decrease of 44% (450 to 250).

But if we look at the while picture above we get the impression that, over time:

  1. There is an expected range of random-looking variation between about 270 and 380 that accounts for the vast majority of days.
  2. There are some occasional, exceptional days.
  3. There is the impression that average activity fell by about 10% in around August 2017.

So, our two-point comparison method seriously misleads us – and if we react to the distorted message that a two-point comparison generates then we run the risk of increasing the variation and making the problem worse.

Lesson: #plotthedots


One of the downsides of SPC is the arcane and unfamiliar language that is associated with it … terms like ‘common cause variation‘ and ‘special cause variation‘.  Sadly, the authors at NHS Improvement have fallen into this ‘special language’ trap and therefore run the risk of creating a new clique.

The lesson here is that SPC is a specific, simplified application of a more generic method called a system behaviour chart (SBC).

The first SPC chart was designed by Walter Shewhart in 1924 for one purpose and one purpose only – for monitoring the output quality of a manufacturing process in terms of how well the product conformed to the required specification.

In other words: SPC is an output quality audit tool for a manufacturing process.

This has a number of important implications for the design of the SPC tool:

  1. The average is not expected to change over time.
  2. The distribution of the random variation is expected to be bell-shaped.
  3. We need to be alerted to sudden shifts.

Shewhart’s chart was designed to detect early signs of deviation of a well-performing manufacturing process.  To detect possible causes that were worth investigating and minimise the adverse effects of over-reacting or under-reacting.


However,  for many reasons, the tool we need for measuring the behaviour of healthcare processes needs to be more sophisticated than the venerable SPC chart.  Here are three of them:

  1. The average is expected to change over time.
  2. The distribution of the random variation is not expected to be bell-shaped.
  3. We need to be alerted to slow drifts.

Under-Reacting to Non-Random Variation

Small shifts and slow drifts can have big cumulative effects.

Suppose I am a NHS service manager and I have a quarterly performance target to meet, so I have asked my data analyst to prepare a RAG chart to review my weekly data.

The quarterly target I need to stay below is 120 and my weekly RAG chart is set to show green when less than 108 (10% below target) and red when more than 132 (10% above target) because I know there is quite a lot of random week-to-week variation.

On the left is my weekly RAG chart for the first two quarters and I am in-the-green for both quarters (i.e. under target).

Q: Do I need to do anything?

A: The first quarter just showed “greens” and “ambers” so I relaxed and did nothing. There are a few “reds” in the second quarter, but about the same number as the “greens” and lots of “ambers” so it looks like I am about on target. I decide to do nothing again.

At the end of Q3 I’m in big trouble!

The quarterly RAG chart has flipped from Green to Red and I am way over target for the whole quarter. I missed the bus and I’m looking for a new job!

So, would a SPC chart have helped me here?

Here it is for Q1 and Q2.  The blue line is the target and the green line is the average … so below target for both quarters, as the RAG chart said.

The was a dip in Q1 for a few weeks but it was not sustained and the rest of the chart looks stable (all the points inside the process limits).  So, “do nothing” seemed like a perfectly reasonable strategy. Now I feel even more of a victim of fortune!

So, let us look at the full set of weekly date for the financial year and apply our  retrospectoscope.

This is just a plain weekly performance run chart with the target limit plotted as the blue line.

It is clear from this that there is a slow upward drift and we can see why our retrospective quarterly RAG chart flipped from green to red, and why neither our weekly RAG chart nor our weekly SPC chart alerted us in time to avoid it!

This problem is often called ‘leading by looking in the rear view mirror‘.

The variation we needed to see was not random, it was a slowly rising average, but it was hidden in the random variation and we missed it.  So we under-reacted and we paid the price.


This example illustrates another limitation of both RAG charts and SPC charts … they are both insensitive to small shifts and slow drifts when there is lots of random variation around, which there usually is.

So, is there a way to avoid this trap?

Yes. We need to learn to use the more powerful system behaviour charts and the systems engineering techniques and tools that accompany them.


But that aside, the rather good 147-page guide from NHS Improvement is a good first step for those still using two-point comparisons and RAG charts and it can be downloaded: HERE

A few years ago I had a rant about the dangers of the widely promoted mantra that 85% is the optimum average measured bed-occupancy target to aim for.

But ranting is annoying, ineffective and often counter-productive.

So, let us revisit this with some calm objectivity and disprove this Myth a step at a time.

The diagram shows the system of interest (SoI) where the blue box represents the beds, the coloured arrows are the patient flows, the white diamond is a decision and the dotted arrow is information about how full the hospital is (i.e. full/not full).

A new emergency arrives (red arrow) and needs to be admitted. If the hospital is not full the patient is moved to an empty bed (orange arrow), the medical magic happens, and some time later the patient is discharged (green arrow).  If there is no bed for the emergency request then we get “spillover” which is the grey arrow, i.e. the patient is diverted elsewhere (n.b. these are critically ill patients …. they cannot sit and wait).


This same diagram could represent patients trying to phone their GP practice for an appointment.  The blue box is the telephone exchange and if all the lines are busy then the call is dropped (grey arrow).  If there is a line free then the call is connected (orange arrow) and joins a queue (blue box) to be answered some time later (green arrow).

In 1917, a Danish mathematician/engineer called Agner Krarup Erlang was working for the Copenhagen Telephone Company and was grappling with this very problem: “How many telephone lines do we need to ensure that dropped calls are infrequent AND the switchboard operators are well utilised?

This is the perennial quality-versus-cost conundrum. The Value-4-Money challenge. Too few lines and the quality of the service falls; too many lines and the cost of the service rises.

Q: Is there a V4M ‘sweet spot” and if so, how do we find it? Trial and error?

The good news is that Erlang solved the problem … mathematically … and the not-so good news is that his equations are very scary to a non mathematician/engineer!  So this solution is not much help to anyone else.


Fortunately, we have a tool for turning scary-equations into easy-2-see-pictures; our trusty Excel spreadsheet. So, here is a picture called a heat-map, and it was generated from one of Erlang’s equations using Excel.

The Erlang equation is lurking in the background, safely out of sight.  It takes two inputs and gives one output.

The first input is the Capacity, which is shown across the top, and it represents the number of beds available each day (known as the space-capacity).

The second input is the Load (or offered load to use the precise term) which is down the left side, and is the number of bed-days required per day (e.g. if we have an average of 10 referrals per day each of whom would require an average 2-day stay then we have an average of 10 x 2 = 20 bed-days of offered load per day).

The output of the Erlang model is the probability that a new arrival finds all the beds are full and the request for a bed fails (i.e. like a dropped telephone call).  This average probability is displayed in the cell.  The colour varies between red (100% failure) and green (0% failure), with an infinite number of shades of red-yellow-green in between.

We can now use our visual heat-map in a number of ways.

a) We can use it to predict the average likelihood of rejection given any combination of bed-capacity and average offered load.

Suppose the average offered load is 20 bed-days per day and we have 20 beds then the heat-map says that we will reject 16% of requests … on average (bottom left cell).  But how can that be? Why do we reject any? We have enough beds on average! It is because of variation. Requests do not arrive in a constant stream equal to the average; there is random variation around that average.  Critically ill patients do not arrive at hospital in a constant stream; so our system needs some resilience and if it does not have it then failures are inevitable and mathematically predictable.

b) We can use it to predict how many beds we need to keep the average rejection rate below an arbitrary but acceptable threshold (i.e. the quality specification).

Suppose the average offered load is 20 bed-days per day, and we want to have a bed available more than 95% of the time (less than 5% failures) then we will need at least 25 beds (bottom right cell).

c) We can use it to estimate the maximum average offered load for a given bed-capacity and required minimum service quality.

Suppose we have 22 beds and we want a quality of >=95% (failure <5%) then we would need to keep the average offered load below 17 bed-days per day (i.e. by modifying the demand and the length of stay because average load = average demand * average length of stay).


There is a further complication we need to be mindful of though … the measured utilisation of the beds is related to the successful admissions (orange arrow in the first diagram) not to the demand (red arrow).  We can illustrate this with a complementary heat map generated in Excel.

For scenario (a) above we have an offered load of 20 bed-days per day, and we have 20 beds but we will reject 16% of requests so the accepted bed load is only 16.8 bed days per day  (i.e. (100%-16%) * 20) which is the reason that the average  utilisation is only 16.8/20 = 84% (bottom left cell).

For scenario (b) we have an offered load of 20 bed-days per day, and 25 beds and will only reject 5% of requests but the average measured utilisation is not 95%, it is only 76% because we have more beds (the accepted bed load is 95% * 20 = 19 bed-days per day and 19/25 = 76%).

For scenario (c) the average measured utilisation would be about 74%.


So, now we see the problem more clearly … if we blindly aim for an average, measured, bed-utilisation of 85% with the untested belief that it is always the optimum … this heat-map says it is impossible to achieve and at the same time offer an acceptable quality (>95%).

We are trading safety for money and that is not an acceptable solution in a health care system.


So where did this “magic” value of 85% come from?

From the same heat-map perhaps?

If we search for the combination of >95% success (<5% fail) and 85% average bed-utilisation then we find it at the point where the offered load reaches 50 bed-days per day and we have a bed-capacity of 56 beds.

And if we search for the combination of >99% success (<1% fail) and 85% average utilisation then we find it with an average offered load of just over 100 bed-days per day and a bed-capacity around 130 beds.

H’mm.  “Houston, we have a problem“.


So, even in this simplified scenario the hypothesis that an 85% average bed-occupancy is a global optimum is disproved.

The reality is that the average bed-occupancy associated with delivering the required quality for a given offered load with a specific number of beds is almost never 85%.  It can range anywhere between 50% and 100%.  Erlang knew that in 1917.


So, if a one-size-fits-all optimum measured average bed-occupancy assumption is not valid then how might we work out how many beds we need and predict what the expected average occupancy will be?

We would design the fit-4-purpose solution for each specific context …
… and to do that we need to learn the skills of complex adaptive system design …
… and that is part of the health care systems engineering (HCSE) skill-set.

 

One of the really, really cool things about the 1.3 kg of “ChimpWare” between our ears is the way it learns.

We have evolved the ability to predict the likely near-future based on just a small number of past experiences.

And we do that by creating stored mental models.

Not even the most powerful computers can do it as well as we do – and we do it without thinking. Literally. It is an unconscious process.

This ability to pro-gnose (=before-know) gave our ancestors a major survival advantage when we were wandering about on the savanna over 10 million years ago.  And we have used this amazing ability to build societies, mega-cities and spaceships.


But this capability is not perfect.  It has a flaw.  Our “ChimpOS” does not store a picture of reality like a digital camera; it stores a patchy and distorted perception of reality, and then fills in the gaps with guesses (i.e. gaffes).  And we do not notice – consciously.

The cognitive trap is set and sits waiting to be sprung.  And to trip us up.


Here is an example:

“Improvement implies change”

Yes. That is a valid statement because we can show that whenever improvement has been the effect, then some time before that a change happened.  And we can show that when there are no changes, the system continues to behave as it always has.  Status quo.

The cognitive trap is that our ChimpOS is very good at remembering temporal associations – for example an association between “improvement” and “change” because we remember in the present.  So, if two concepts are presented at the same time, and we spice-the-pie with a bit of strong emotion, then we are more likely to associate them. Which is OK.

The problem comes when we play back the memory … it can come back as …

“change implies improvement” which is not valid.  And we do not notice.

To prove it is not valid we just need to find one example where a change led to a deterioration; an unintended negative consequence, a surprising, confusing and disappointing failure to achieve our intended improvement.

An embarrassing gap between our intent and our impact.

And finding that evidence is not hard.  Failures and disappointments in the world of improvement are all too common.


And then we can fall into the same cognitive trap because we generalise from a single, bad experience and the lesson our ChimpOS stores for future reference is “change is bad”.

And forever afterwards we feel anxious whenever the idea of change is suggested.

It is a very effective survival tactic – for a hominid living on the African savanna 10 million years ago, and at risk of falling prey to sharp-fanged, hungry predators.  It is a less useful tactic in the modern world where the risk of being eaten-for-lunch is minimal, and where the pace of change is accelerating.  We must learn to innovate and improve to survive in the social jungle … and we are not well equipped!


Here is another common cognitive trap:

Excellence implies no failures.

Yes. If we are delivering a consistently excellent service then the absence of failures will be a noticeable feature.

No failures implies excellence.

This is not a valid inference.  If quality-of-service is measured on a continuum from Excrement-to-Excellent, then we can be delivering a consistently mediocre service, one that is barely adequate, and also have no failures.


The design flaw here is that our ChimpWare/ChimpOS memory system is lossy.

We do not remember all the information required to reconstruct an accurate memory of reality – because there is too much information.  So we distort, we delete and we generalise.  And we do that because when we evolved it was a good enough solution, and it enabled us to survive as a species, so the ChimpWare/ChimpOS genes were passed on.

We cannot reverse millions of years of evolution.  We cannot get a wetware or a software upgrade.  We need to learn to manage with the limitations of what we have between our ears.

And to avoid the cognitive traps we need to practice the discipline of bringing our unconscious assumptions up to conscious awareness … and we do that by asking carefully framed questions.

Here is another example:

A high-efficiency design implies high-utilisation of resources.

Yes, that is valid. Idle resources means wasted resources which means lower efficiency.

Q1: Is the converse also valid?
Q2: Is there any evidence that disproves the converse is valid?

If high-utilisation does not imply high-efficiency, what are the implications of falling into this cognitive trap?  What is the value of measuring utilisation? Does it have a value?

These are useful questions.

When a system reaches the limit of its resilience, it does not fail gradually; it fails catastrophically.  Up until the point of collapse the appearance of stability is reassuring … but it is an illusion.

A drowning person kicks frantically until they are exhausted … then they sink very quickly.

Below is the time series chart that shows the health of the UK Emergency Health Care System from 2011 to the present.

The seasonal cycle is made obvious by the regular winter dips. The progressive decline in England, Wales and NI is also clear, but we can see that Scotland did something different in 2015 and reversed the downward trend and sustained that improvement.

Until, the whole system failed in the winter of 2017/18. Catastrophically.

The NHS is a very complicated system so what hope do we have of understanding what is going on?


The human body is also a complicated system.

In the 19th Century, a profound insight into how the human body works was proposed by the French physiologist, Claude Bernard.

He talked about the stability of the milieu intérieur and his concept came to be called homeostasis: The principle that a self-regulating system can maintain its own stability over a wide range.  In other words, it demonstrates resilience to variation.

The essence of a homeostatic system is that the output is maintained using a compensatory feedback loop, one that is assembled by connecting sensors to processors to effectors. Input-Process-Output (IPO).

And to assess how much stress the whole homeostatic system is under, we do not measure the output (because that is maintained steady by the homeostatic feedback design), instead we measure how hard the stabilising feedback loop is working!


And, when the feedback loop reaches the limit of its ability to compensate, the whole system will fail.  Quickly. Catastrophically.  And when this happens in the human body we call this a “critical illness”.

Doctors know this.  Engineers know this.  But do those who decide and deliver health care policy know this?  The uncomfortable evidence above suggests that they might not.

The homeostatic feedback loop is the “inner voice” of the system.  In the NHS it is the collective voices of those at the point of care who sense the pressure and who are paddling increasingly frantically to minimize risk and to maintain patient safety.

And being deaf to that inner voice is a very dangerous flaw in the system design!


Once a complicated system has collapsed, then it is both difficult and expensive to resuscitate and recover, especially if the underpinning system design flaws are not addressed.

And, if we learn how to diagnose and treat these system design errors, then it is possible to “flip” the system back into stable and acceptable performance.

Surprisingly quickly.


Read on »

It is that time of year – again.

Winter.

The NHS is struggling, front-line staff are having to use heroic measures just to keep the ship afloat, and less urgent work has been suspended to free up space and time to help man the emergency pumps.

And the finger-of-blame is being waggled by the army of armchair experts whose diagnosis is unanimous: “lack of cash caused by an austerity triggered budget constraint”.


And the evidence seems plausible.

The A&E performance data says that each year since 2009, the proportion of patients waiting more than 4 hours in A&Es has been increasing.  And the increase is accelerating. This is a progressive quality failure.

And health care spending since the NHS was born in 1948 shows a very similar accelerating pattern.    

So which is the chicken and which is the egg?  Or are they both symptoms of something else? Something deeper?


Both of these charts are characteristic of a particular type of system behaviour called a positive feedback loop.  And the cost chart shows what happens when someone attempts to control the cash by capping the budget:  It appears to work for a while … but the “pressure” is building up inside the system … and eventually the cash-limiter fails. Usually catastrophically. Bang!


The quality chart shows an associated effect of the “pressure” building inside the acute hospitals, and it is a very well understood phenomenon called an Erlang-Kingman queue.  It is caused by the inevitable natural variation in demand meeting a cash-constrained, high-resistance, high-pressure, service provider.  The effect is to amplify the natural variation and to create something much more dangerous and expensive: chaos.


The simple line-charts above show the long-term, aggregated  effects and they hide the extremely complicated internal structure and the highly complex internal behaviour of the actual system.

One technique that system engineers use to represent this complexity is a causal loop diagram or CLD.

The arrows are of two types; green indicates a positive effect, and red indicates a negative effect.

This simplified CLD is dominated by green arrows all converging on “Cost of Care”.  They are the positive drivers of the relentless upward cost pressure.

Health care is a victim of its own success.

So, if the cash is limited then the naturally varying demand will generate the queues, delays and chaos that have such a damaging effect on patients, providers and purses.

Safety and quality are adversely affected. Disappointment, frustration and anxiety are rife. Expectation is lowered.  Confidence and trust are eroded.  But costs continue to escalate because chaos is expensive to manage.

This system behaviour is what we are seeing in the press.

The cost-constraint has, paradoxically, had exactly the opposite effect, because it is treating the effect (the symptom) and ignoring the cause (the disease).


The CLD has one negative feedback loop that is linked to “Efficiency of Processes”.  It is the only one that counteracts all of the other positive drivers.  And it is the consequence of the “System Design”.

What this means is: To achieve all the other benefits without the pressures on people and purses, all the complicated interdependent processes required to deliver the evolving health care needs of the population must be proactively designed to be as efficient as technically possible.


And that is not easy or obvious.  Efficient design does not happen naturally.  It is hard work!  It requires knowledge of the Anatomy and Physiology of Systems and of the Pathology of Variation.  It requires understanding how to achieve effectiveness and efficiency at the same time as avoiding queues and chaos.  It requires that the whole system is continually and proactively re-designed to remain reliable and resilient.

And that implies it has to be done by the system itself; and that means the NHS needs embedded health care systems engineering know-how.

And when we go looking for that we discover sequence of gaps.

An Awareness gap, a Belief gap and a Capability gap. ABC.

So the first gap to fill is the Awareness gap.

The New Year of 2018 has brought some unexpected challenges. Or were they?

We have belligerent bullies with their fingers on their nuclear buttons.

We have an NHS in crisis, with corridor-queues of urgent frail, elderly, unwell and a month of cancelled elective operations.

And we have winter storms, fallen trees, fractured power-lines, and threatened floods – all being handled rather well by people who are trained to manage the unexpected.

Which is the title of this rather interesting book that talks a lot about HROs.

So what are HROs?


“H” stands for High.  “O” stands for Organisation.

What does R stand for?  Rhetoric? Rigidity? Resistance?

Watching the news might lead one to suggest these words would fit … but they are not the answer.

“R” stands for Reliability and “R” stands for Resilience … and they are linked.


Think of a global system that is so reliable that we all depend on it, everyday.  The Global Positioning System or the Internet perhaps.  We rely on them because they serve a need and because they work. Reliably and resiliently.

And that was no accident.

Both the Internet and the GPS were designed and built to meet the needs of billions and to be reliable and resilient.  They were both created by an army of unsung heroes called systems engineers – who were just doing their job. The job they were trained to do.


The NHS serves a need – and often an urgent one, so it must also be reliable. But it is not.

The NHS needs to be resilient. It must cope with the ebb and flow of seasonal illness. But it does not.

And that is because the NHS has not been designed to be either reliable or resilient. And that is because the NHS has not been designed.  And that is because the NHS does not appear to have enough health care systems engineers trained to do that job.

But systems engineering is a mature discipline, and it works just as well inside health care as it does outside.


And to support that statement, here is evidence of what happened after a team of NHS clinicians and managers were trained in the basics of HCSE.

Monklands A&E Improvement

So the gap seems to be just an awareness/ability gap … which is a bridgeable one.


Who would like to train to be a Health Case Systems Engineer and to join the growing community of HCSE practitioners who have the potential to be the future unsung heroes of the NHS?

Click here if you are interested: http://www.ihcse.uk

PS. “Managing the Unexpected” is an excellent introduction to SE.

It had been some time since Bob and Leslie had chatted so an email from the blue was a welcome distraction from a complex data analysis task.

<Bob> Hi Leslie, great to hear from you. I was beginning to think you had lost interest in health care improvement-by-design.

<Leslie> Hi Bob, not at all.  Rather the opposite.  I’ve been very busy using everything that I’ve learned so far.  It’s applications are endless, but I have hit a problem that I have been unable to solve, and it is driving me nuts!

<Bob> OK. That sounds encouraging and interesting.  Would you be able to outline this thorny problem and I will help if I can.

<Leslie> Thanks Bob.  It relates to a big issue that my organisation is stuck with – managing urgent admissions.  The problem is that very often there is no bed available, but there is no predictability to that.  It feels like a lottery; a quality and safety lottery.  The clinicians are clamoring for “more beds” but the commissioners are saying “there is no more money“.  So the focus has turned to reducing length of stay.

<Bob> OK.  A focus on length of stay sounds reasonable.  Reducing that can free up enough beds to provide the necessary space-capacity resilience to dramatically improve the service quality.  So long as you don’t then close all the “empty” beds to save money, or fall into the trap of believing that 85% average bed occupancy is the “optimum”.

<Leslie> Yes, I know.  We have explored all of these topics before.  That is not the problem.

<Bob> OK. What is the problem?

<Leslie> The problem is demonstrating objectively that the length-of-stay reduction experiments are having a beneficial impact.  The data seems to say they they are, and the senior managers are trumpeting the success, but the people on the ground say they are not. We have hit a stalemate.


<Bob> Ah ha!  That old chestnut.  So, can I first ask what happens to the patients who cannot get a bed urgently?

<Leslie> Good question.  We have mapped and measured that.  What happens is the most urgent admission failures spill over to commercial service providers, who charge a fee-per-case and we have no choice but to pay it.  The Director of Finance is going mental!  The less urgent admission failures just wait on queue-in-the-community until a bed becomes available.  They are the ones who are complaining the most, so the Director of Governance is also going mental.  The Director of Operations is caught in the cross-fire and the Chief Executive and Chair are doing their best to calm frayed tempers and to referee the increasingly toxic arguments.

<Bob> OK.  I can see why a “Reduce Length of Stay Initiative” would tick everyone’s Nice If box.  So, the data analysts are saying “the length of stay has come down since the Initiative was launched” but the teams on the ground are saying “it feels the same to us … the beds are still full and we still cannot admit patients“.

<Leslie> Yes, that is exactly it.  And everyone has come to the conclusion that demand must have increased so it is pointless to attempt to reduce length of stay because when we do that it just sucks in more work.  They are feeling increasingly helpless and hopeless.

<Bob> OK.  Well, the “chronic backlog of unmet need” issue is certainly possible, but your data will show if admissions have gone up.

<Leslie> I know, and as far as I can see they have not.

<Bob> OK.  So I’m guessing that the next explanation is that “the data is wonky“.

<Leslie> Yup.  Spot on.  So, to counter that the Information Department has embarked on a massive push on data collection and quality control and they are adamant that the data is complete and clean.

<Bob> OK.  So what is your diagnosis?

<Leslie> I don’t have one, that’s why I emailed you.  I’m stuck.


<Bob> OK.  We need a diagnosis, and that means we need to take a “history” and “examine” the process.  Can you tell me the outline of the RLoS Initiative.

<Leslie> We knew that we would need a baseline to measure from so we got the historical admission and discharge data and plotted a Diagnostic Vitals Chart®.  I have learned something from my HCSE training!  Then we planned the implementation of a visual feedback tool that would show ward staff which patients were delayed so that they could focus on “unblocking” the bottlenecks.  We then planned to measure the impact of the intervention for three months, and then we planned to compare the average length of stay before and after the RLoS Intervention with a big enough data set to give us an accurate estimate of the averages.  The data showed a very obvious improvement, a highly statistically significant one.

<Bob> OK.  It sounds like you have avoided the usual trap of just relying on subjective feedback, and now have a different problem because your objective and subjective feedback are in disagreement.

<Leslie> Yes.  And I have to say, getting stuck like this has rather dented my confidence.

<Bob> Fear not Leslie.  I said this is an “old chestnut” and I can say with 100% confidence that you already have what you need in your T4 kit bag?

<Leslie>Tee-Four?

<Bob> Sorry, a new abbreviation. It stands for “theory, techniques, tools and training“.

<Leslie> Phew!  That is very reassuring to hear, but it does not tell me what to do next.

<Bob> You are an engineer now Leslie, so you need to don the hard-hat of Improvement-by-Design.  Start with your Needs Analysis.


<Leslie> OK.  I need a trustworthy tool that will tell me if the planned intervention has has a significant impact on length of stay, for better or worse or not at all.  And I need it to tell me that quickly so I can decide what to do next.

<Bob> Good.  Now list all the things that you currently have that you feel you can trust.

<Leslie> I do actually trust that the Information team collect, store, verify and clean the raw data – they are really passionate about it.  And I do trust that the front line teams are giving accurate subjective feedback – I work with them and they are just as passionate.  And I do trust the systems engineering “T4” kit bag – it has proven itself again-and-again.

<Bob> Good, and I say that because you have everything you need to solve this, and it sounds like the data analysis part of the process is a good place to focus.

<Leslie> That was my conclusion too.  And I have looked at the process, and I can’t see a flaw. It is driving me nuts!

<Bob> OK.  Let us take a different tack.  Have you thought about designing the tool you need from scratch?

<Leslie> No. I’ve been using the ones I already have, and assume that I must be using them incorrectly, but I can’t see where I’m going wrong.

<Bob> Ah!  Then, I think it would be a good idea to run each of your tools through a verification test and check that they are fit-4-purpose in this specific context.

<Leslie> OK. That sounds like something I haven’t covered before.

<Bob> I know.  Designing verification test-rigs is part of the Level 2 training.  I think you have demonstrated that you are ready to take the next step up the HCSE learning curve.

<Leslie> Do you mean I can learn how to design and build my own tools?  Special tools for specific tasks?

<Bob> Yup.  All the techniques and tools that you are using now had to be specified, designed, built, verified, and validated. That is why you can trust them to be fit-4-purpose.

<Leslie> Wooohooo! I knew it was a good idea to give you a call.  Let’s get started.


[Postscript] And Leslie, together with the other stakeholders, went on to design the tool that they needed and to use the available data to dissolve the stalemate.  And once everyone was on the same page again they were able to work collaboratively to resolve the flow problems, and to improve the safety, flow, quality and affordability of their service.  Oh, and to know for sure that they had improved it.

One of the quickest and easiest ways to kill an improvement initiative stone dead is to label it as a “cost improvement program” or C.I.P.

Everyone knows that the biggest single contributor to cost is salaries.

So cost reduction means head count reduction which mean people lose their jobs and their livelihood.

Who is going to sign up to that?

It would be like turkeys voting for Xmas.

There must be a better approach?

Yes. There is.


Over the last few weeks, groups of curious skeptics have experienced the immediate impact of systems engineering theory, techniques and tools in a health care context.

They experienced queues, delays and chaos evaporate in front of their eyes … and it cost nothing to achieve. No extra resources. No extra capacity. No extra cash.

Their reaction was “surprise and delight”.

But … it also exposed a problem.  An undiscussable problem.


Queues and chaos require expensive resources to manage.

We call them triagers, progress-chasers, and fire-fighters.  And when the queues and chaos evaporate then their jobs do too.

The problem is that the very people who are needed to make the change happen are the ones who become surplus-to-requirement as a result of the change.

So change does not happen.

It would like turkeys voting for Xmas.


The way around this impasse is to anticipate the effect and to proactively plan to re-invest the resource that is released.  And to re-invest it doing a more interesting and more worthwhile jobs than queue-and-chaos management.

One opportunity for re-investment is called time-buffering which is an effective way to improve resilience to variation, especially in an unscheduled care context.

Another opportunity for re-investment is tail-gunning the chronic backlogs until they are down to a safe and sensible size.

And many complain that they do not have time to learn about improvement because they are too busy managing the current chaos.

So, another opportunity for re-investment is training – oneself first and then others.


R.I.P.    C.I.P.

The NHS appears to be descending in a frenzy of fear as the winter looms and everyone says it will be worse than last and the one before that.

And with that we-are-going-to-fail mindset, it almost certainly will.

Athletes do not start a race believing that they are doomed to fail … they hold a belief that they can win the race and that they will learn and improve even if they do not. It is a win-win mindset.

But to succeed in sport requires more than just a positive attitude.

It also requires skills, training, practice and experience.

The same is true in healthcare improvement.


That is not the barrier though … the barrier is disbelief.

And that comes from not having experienced what it is like to take a system that is failing and transform it into one that is succeeding.

Logically, rationally, enjoyably and surprisingly quickly.

And, the widespread disbelief that it is possible is paradoxical because there are plenty of examples where others have done exactly that.

The disbelief seems to be “I do not believe that will work in my world and in my hands!

And the only way to dismantle that barrier-of-disbelief is … by doing it.


How do we do that?

The emotionally safest way is in a context that is carefully designed to enable us to surface the unconscious assumptions that are the bricks in our individual Barriers of Disbelief.

And to discard the ones that do not pass a Reality Check, and keep the ones that are OK.

This Disbelief-Busting design has been proven to be effective, as evidenced by the growing number of individuals who are learning how to do it themselves, and how to inspire, teach and coach others to as well.


So, if you would like to flip disbelief-and-hopeless into belief-and-hope … then the door is here.

It is always rewarding when separate but related ideas come together and go “click”.

And this week I had one of those “ah ha” moments while attempting to explain how the process of engagement works.

Many years ago I was introduced to the conscious-competence model of learning which I found really insightful.  Sometime later I renamed it as the awareness-ability model because the term competence felt too judgmental.

The idea is that when we learn we all start from a position of being unaware of our inability.

A state called blissful ignorance.

And it is only when we try to do something that we become aware of what we cannot do; which can lead to temper tantrums!

As we concentrate and practice our ability improves and we enter the zone of know how.  We become able to demonstrate what we can do, and explain how we are doing it.

The final phase comes when it becomes so habitual that we forget how we learned our skill – it has become second nature.


Some years later I was introduced to the Nerve Curve which is the emotional roller-coaster ride that accompanies change.  Any form of change.

A five-step model was described in the context of bereavement by psychiatrist Elisabeth Kübler-Ross in her 1969 book “On Death & Dying: What the Dying Have to Teach Doctors, Nurses, Clergy and their Families.

More recently this has been extended and applied by authors such as William Bridges and John Fisher in the less emotionally traumatic contexts called transitions.

The characteristic sequence of emotions are triggered by external events are:

  • shock
  • denial
  • frustration
  • blame
  • guilt
  • depression
  • acceptance
  • engagement
  • excitement.

The important messages in both of these models is that we can get stuck along the path of transition, and we can disengage at several points, signalling to others that we have come off the track.  When we do that we exhibit behaviours such as denial, disillusionment and hostility.


More recently I was introduced to the work of the late Chris Argyris and specifically the concept of “defensive reasoning“.

The essence of the concept:  As we start to become aware of a gap between our intentions and our impact, then we feel threatened and our natural reaction is defensive.  This is the essence of the behaviour called “resistance to change”, and it is interesting to note that “smart” people are particularly adept at it.


These three concepts are clearly related in some way … but how?


As a systems engineer I am used to cyclical processes and the concepts of wavelength, amplitude, phase and offset, and I found myself looking at the Awareness-Ability cycle and asking:

“How could that cycle generate the characteristic shape of the transition curve?”

Then the Argyris idea of the gap between intent and impact popped up and triggered another question:

“What if we look at the gap between our ability and our awareness?”

So, I conducted a thought experiment and imagined myself going around the cycle – and charting my ability, awareness and emotional state along the way … and this sketch emerged. Ah ha!

When my awareness exceeded my ability I felt disheartened. That is the defensive reasoning that Chris Argyris talks about, the emotional barrier to self-improvement.


Ability – Awareness = Engagement


This suggested to me that the process of building self-engagement requires opening the ability-versus-awareness gap a little-bit-at-a-time, sensing the emotional discomfort, and then actively releasing the tension by learning a new concept, principle, technique or tool (and usually all four).

Eureka!

I wonder if the same strategy would work elsewhere?