The Army Memo to Stop Using Mil HDBK 217
Over 20 years ago the Assistant Secretary of the Army directed the Army to not use MIL HBK 217 in a request for proposals, even for guidance. Exceptions, by waiver only.
217 is still around and routinely called out. That is a lot of waivers.
Why is 217 and other parts count database prediction packages still in use? Let’s explore the memo a bit more, plus ponder what is maintaining the popularity of 217 and ilk.
The 1995 Decker Memo
Gilbert F. Decker signed a short policy statement in a memorandum. The subject is “Policy on Incorporating a Performance-Based Approach to Reliability in Request for Proposals (RFPs)”. He states that reliability requirements should include:
1) quantified reliability requirements and allowable uncertainties.
Reliability as a metric is a quantifiable requirement and includes the probability of survival, duration, environment, and function. I do not agree that a requirement or goal should include ‘allowable uncertainties’ as the requirement is the desired state for the population. Although using confidence levels for sampling is common practice as testing 100% of units to failure is not practicable.
2) failure definitions and thresholds
This is part of the reliability metric, what is the function, and when that doesn’t work, that is a failure.
3) life-cycle conditions
Again already part of the reliability metric.
The memo goes on to include a rationale that including failure definitions and life-cycle conditions are necessary for a fully defined reliability requirement. Duh! See the definition of reliability and it’s essential four elements.
It seems the memo has missed the duration element of the reliability metric – that is unfortunate.
MIL HDBK 217 and the Memo
My favorite part of the memo and why a good friend sent the memo to me was the specifically stating that RFPs should not use 217 at all. And should not request the use of parts count predictions in general.
While there are plenty of reasons to avoid 217 and similar, the memo states they have been ‘shown to be unreliable and its use can lead to erroneous and misleading reliability prediction.’
Even though the 217 document itself says essentially the same, not to be used to predict field performance, the Assistant Secretary of the Army felt the point needed reiterating.
Why does 217 Remain in Use?
The easy answer is there are plenty of non-thinking people in the world that want to just what something easy to ‘get reliability’ done.
The harder answer is we all want to predict the future reliability performance for our products. We want to know what will fail and when. That is difficult to do well. It costs money, time, resources, and is still difficult to get accurate results.
Doing what has been done before, predictions, what is easy, parts count predictions, just seems too compelling for too many. Thus 217 remains part of RFPs and too many reliability programs.
What Can We Do Today?
Well to start, start using performance-based reliability requirements. When setting requirements use reliability (A function or set of functions that should survive a duration(s) with some probability given a set of environmental conditions.) It is complete and performance-based.
When confronted with the results of a prediction, especially a parts count or 217 based prediction, refrain from laughing as you toss out (delete) the bit of worthless information. Instead, restate the request for meaningful information concerning what will fail and when.
Physical of failure modelings, accelerated life testing, and other approaches provide a means to estimate future performance. Tallying up failure rates does not.
Now if only we can find (or create) memo’s outlawing the use of MTBF!
Grady says
Fred:
217 was to be used as a guide and the predictions validated by testing and growth. When reliability tests get expensive most companies will cut the costs for R& M tests and operate the equipment to the product maturity point. If we flow down realistic R & M requirements if data is available better prediction models can be built. As for MTBF at the product system level it can be useful because some components never see the MTBF if the components are replaced. R&M practitioners need to be clear in calculating the mean and speciify what distribution is used normal exponential or weibull functions.
Fred Schenkelberg says
Hi Grady,
I agree with every point except one… I would, as you may suspect do not agree with the use of MTBF under any circumstance. Even with distributions notes, conditions listed, etc. it is so misused and misunderstood to be absolutely worthless, IMHO.
217 and similar predictions have by design very limited application and again are so often misunderstood and misapplied to render the practice dangerous.
Cheers,
Fred
Paul Franklin says
Predictions are difficult, especially in matters concerning the future. MTBF is almost always a problem, and 217 (and all other parts count methods) make assumptions that are not met. Two main issues come immediately to mind. First is the idea that failure rates are constant. They aren’t, but pretending they are makes the math easier. The other main defect is the assumption that you can get useful estimates of failure rates from a table. You can’t, because the failure modes that are actually present depend not only on the components you used, but how you design the product. This list could go on, and the conclusion is unchanged: parts count predictions are easy to do, and that doesn’t make them right or useful.
At the same time, physics of failure methods and reliability testing have their own pitfalls. Activation energy is rarely 0.7 eV (so acceleration factors can lead to the “Erroneous” equation), and parameter estimates for physics of failure methods require parameter estimates and models. So it’s necessary to be smart when using those methods as well. Still, data is data, and that’s a definite improvement over using a model whose assumptions just don’t match very much in the real world.
Grady says
Paul:
The problem with canceling Mil Standards is who cancelled the standard and what did they replace the standards with and has the individual ever used the standards they cancel ? R&M Practitioners need to know what replaces what has been cancelled to avoid having no technical standard as sa guide.
Fred Schenkelberg says
I would make the argument that you and others do not require standards to define and achieve the desired system reliability performance. Standards have long held back the discussion on what is and is not reliability and how to actually achieve it with real systems.
Canceled standards do not require replacement other than with solid engineering that achieves the desired reliability performance.
Cheers,
Fred
Paul Franklin says
Amen.
Mark Woodruff says
“Predictions are difficult, especially in matters concerning the future.” Rolling with laughter. Was that meant to be as funny as it was? If not, oops. My bad.
Fred Schenkelberg says
good one – from what I understand that is pretty close to an old Danish proverb and sometimes attributed to Niels Bohr or Yogi Berra. cheers, Fred (and enjoy the laughter)
Alan Pettitt says
I overheard this response by someone who should have known better.
“We have 3 units on test each of which has achieved 1200hrs without failure therefore we can quote 6000hrs MTBF”
Not sure if I was more confused by the lack of understanding or the maths.
Fred Schenkelberg says
Amazing…. I’ve been relaying this story all day – where to start? Hope you found it a good moment to educate the person.
Cheers,
Fred
Thomas Reiter says
“We have 3 units on test each of which has achieved 1200hrs without failure therefore we can quote 6000hrs MTBF”
–> 3 x 1200 = 3600 test hours.
3600 test hours with no fails –> demonstrated MTBF ~ 5200h
(You will need the chi square distribution in order to calculate MTBF based on tests with no fails)
–> quoting 6000h is arguable, in my opinion.
Thomas
Fred Schenkelberg says
Hi Thomas,
Agreed, quoting 6k MTBF is rather dubious. Instead of using Mil Hdbk 217 there is some test, not a very good one, yet some testing. Plenty of wholes in the logic the leads to the 6k. Assuming a constant hazard rate, using upper bound, and rounding up, are just three that come to mind as a basis for being deceitful.
Cheers,
Fred