The Magic Math of Meeting MTBF Requirements
Recently heard from a reader of NoMTBF. She wondered about a supplier’s argument that they meet the reliability or MTBF requirements. She was right to wonder.
Estimating reliability performance a new design is difficult.
There are good and better practice to justify claims about future reliability performance. Likewise, there are just plain poor approaches, too. Plus there are approaches that should never be used.
The Vendor Calculation to Support Claim They Meet Reliability Objective
Let’s say we contract with a vendor to create a navigation system for our vehicle. The specification includes functional requirements. Also it includes form factor and a long list of other requirements. It also clearly states the reliability specification. Let’s say the unit should last with 95% probability over 10 years of use within our vehicle. We provide environmental and function requirements in detail.
The vendor first converts the 95% probability of success over 10 years into MTBF. Claiming they are ‘more familiar’ with MTBF. The ignore the requirements for probability of first month of operation success. Likewise they ignore the 5 year targeted reliability, or as they would convert, MTBF requirements.
[Note: if you were tempted to calculate the equivalent MTBF, please don’t. It’s not useful, nor relevant, and a poor practice. Suffice it to say it would be a large and meaningless number]
RED FLAG By converting the requirement into MTBF it suggests they may be making simplifying assumptions. This may permit easier use of estimation, modeling, and testing approaches.
The Vendor’s Approach to ‘Prove’ The Meet the MTBF Requirement
The vendor reported they met the reliability requirement using the following logic:
Of the 1,000 (more actually) components we selected 6 at random for accelerated life testing. We estimated the lower 60% confidence of the probability of surviving 10 years given the ALT results. Then converted the ALT results to MTBF for the part.
We then added the Mil Hdbk 217 failure rate estimate to the ALT result for each of the 6 parts.
RED FLAG This one has me wondering the rationale for adding failure rates of an ALT and a parts count prediction. It would make the failure rate higher. Maybe it was a means to add a bit of margin to cover the uncertainty? I’m not sure, do you have any idea why someone would do this? Are they assuming the ALT did not actually measure anything relevant or any specific failure mechanisms, or they used a benign stress? ALT details were not provided.
The Approach Gets Weird Here
Then we use a 217 parts count prediction along with the modified 6 component failure rates to estimate the system failure rate, and with a simple inversion estimated the MTBF. They then claimed the system design will meet the field reliability performance requirements.
RED FLAG Mil HDBK 217 F in section 3.3 states
Hence, a reliability prediction should never be assumed to represent the expected field reliability …
If you are going to use a standard, any standard, one should read it. Read to understand when and why it is useful or not useful.
What Should the Vendor Have Done Instead?
There are a lot of ways to create a new design and meet reliability requirements.
- The build, test, fix approach or reliability growth approach works well in many circumstances.
- Using similar actually fielded systems failure data. It may provide a reasonable bound for an estimate of a new system. It may also limit the focus on the accelerated testing to only the novel or new or high risk areas of the new design — given much of the design is (or may be) similar to past products.
- Using a simple reliability block diagram or fault tree analysis model to assembly the estimates, test results, engineering stress/strength analysis (all better estimation tools then parts count, in my opinion) and calculate a system reliability estimate.
- Using a risk of failure approach with FMEA and HALT to identify the likely failure mechanisms then characterize those mechanisms to determine their time to failure distributions. If there is one or a few dominant failure mechanisms, that work would provide a reasonable estimate of the system reliability.
In all cases focus on failure mechanisms and how the time to failure distribution changes given changes in stress / environment / use conditions. Monte Carlo may provide a suitable means to analysis a great mixture of data to determine an estimate. Use reliability, probability of success over a duration.
In short, do the work to understand the design, it’s weaknesses, the time to failure behavior under different use/condition scenarios, and make justifiable assumptions only when necessary.
Summary
We engage vendors to supply custom subsystems given their expertise and ability to deliver the units we need for our vehicle. We expect them to justify they meet reliability requirements in a rationale and defendable manner. While we do not want to dictate the approach tot he design or the estimate of reliability performance, we certainly have to judge the acceptability of the claims they meet the requirements.
What do you report when a customer asks if your product will meet the reliability requirements? Add to the list of possible approaches in the comments section below.
Related
Paul Franklin says
Your reader was right to wonder. And it is good advice to consider the likely failure modes and design to minimize their impact.
I keep reminding managers that you can do all the analysis you’d like, and all the testing you can afford, and you still won’t know if you meet any quantitative requirement (e.g., 95% reliability at 10 years or 50% at 3 years).
But I still recommend that we do the analysis, and we do the testing, and we look at field data. And then, we ask how we can minimize the rate of occurrence of the failure modes we know about and what happens if some event occurs. This gives good results. We have some quantitative estimates, but more importantly, we know what the limits of those estimates are likely to be *AND* we’ve know that we have done what we can to minimize the failure rate (even if it is time varying) and we have worked to minimize the effects of those failures that do occur. And that’s probably more effective than computing large numbers.
Fred Schenkelberg says
Thanks for comment Paul – yes, predictions about the future are tough and include plenty of uncertainty. I think the key is understanding how our work to address reliability works and the limitations / assumptions involved.
We will always have remaining risks of product failures when shipping, it’s working to understand and reduce those risks that is key.
Cheers,
Fred
Kirk Gray says
Good article Fred. Another amazing aspect that is not mentioned this article is the last update of MIL-HDBK 217F was in 1995, over 20 years ago! The simple formulas used in MIL-HDBK 217F and derived FIT rates for components was fundamentally flawed back then, and even more so 20 years later! It was also removed as a government contractual reference document shortly after the last revision thanks in part to the work of Prof. Michael Pecht and CALCE. Unfortunately, as illustrated in your article, MIL-HDBK 217F is still being used today. In our recently published book co-authored with John J. Paschkewitz from Wiley and Sons titled “Next Generation HALT and HASS: Robust Design of Electronics and Systems”, we reprinted a a paper co-written by the US Army and CALCE titled “Reliability Prediction – A Continued Reliance on a Misleading Approach”. It can be downloaded from and a link to purchase our new HALT and HASS book can be found at http://www.acceleratedreliabilitysolutions.com