Type II Error

Saturday, March 19, 2016

figure_pointing_out_chart_data_150_clr_8005It was the time for Bob and Leslie’s regular Improvement Science coaching session.

<Leslie> Hi Bob, how are you today?

<Bob> I am getting over a winter cold but otherwise I am good.  And you?

<Leslie> I am OK and I need to talk something through with you because I suspect you will be able to help.

<Bob> OK. What is the context?

<Leslie> Well, one of the projects that I am involved with is looking at the elderly unplanned admission stream which accounts for less than half of our unplanned admissions but more than half of our bed days.

<Bob> OK. So what were you looking to improve?

<Leslie> We want to reduce the average length of stay so that we free up beds to provide resilient space-capacity to ease the 4-hour A&E admission delay niggle.

<Bob> That sounds like a very reasonable strategy.  So have you made any changes and measured any improvements?

<Leslie> We worked through the 6M Design® sequence. We studied the current system, diagnosed some time traps and bottlenecks, redesigned the ones we could influence, modified the system, and continued to measure to monitor the effect.

<Bob> And?

<Leslie> It feels better but the system behaviour charts do not show an improvement.

<Bob> Which charts, specifically?

<Leslie> The BaseLine XmR charts of average length of stay for each week of activity.

<Bob> And you locked the limits when you made the changes?

<Leslie> Yes. And there still were no red flags. So that means our changes have not had a significant effect. But it definitely feels better. Am I deluding myself?

<Bob> I do not believe so. Your subjective assessment is very likely to be accurate. Our Chimp OS 1.0 is very good at some things! I think the issue is with the tool you are using to measure the change.

<Leslie> The XmR chart?  But I thought that was THE tool to use?

<Bob> Like all tools it is designed for a specific purpose.  Are you familiar with the term Type II Error.

<Leslie> Doesn’t that come from research? I seem to remember that is the error we make when we have an under-powered study.  When our sample size is too small to confidently detect the change in the mean that we are looking for.

<Bob> A perfect definition!  The same error can happen when we are doing before and after studies too.  And when it does, we see the pattern you have just described: the process feels better but we do not see any red flags on our BaseLine© chart.

<Leslie> But if our changes only have a small effect how can it feel better?

<Bob> Because some changes have cumulative effects and we omit to measure them.

<Leslie> OMG!  That makes complete sense!  For example, if my bank balance is stable my average income and average expenses are balanced over time. So if I make a small-but-sustained improvement to my expenses, like using lower cost generic label products, then I will see a cumulative benefit over time to the balance, but not the monthly expenses; because the noise swamps the signal on that chart!

<Bob> An excellent analogy!

<Leslie> So the XmR chart is not the tool for this job. And if this is the only tool we have then we risk making a Type II error. Is that correct?

<Bob> Yes. We do still use an XmR chart first though, because if there is a big enough and fast enough shift then the XmR chart will reveal it.  If there is not then we do not give up just yet; we reach for our more sensitive shift detector tool.

<Leslie> Which is?

<Bob> I will leave you to ponder on that question.  You are a trained designer now so it is time to put your designer hat on and first consider the purpose of this new tool, and then create the outline a fit-for-purpose design.

<Leslie> OK, I am on the case!