How to Adjust Parameters to Achieve MTBF
A troublesome question arrived via email the other day. The author wanted to know if I knew how and could help them adjust the parameters of a parts count prediction such that they arrived at the customer’s required MTBF value.
I was blunt with my response.
My response should have been that they should focus on the improving the reliability of the design and not worry about the results of a 217 based prediction.
The list of issues in the question
- Using MTBF to specify reliability
- Using MTBF to measure reliability
- Using 217 based parts count approach to estimate field reliability
- Attempting to adjust or tweak parts count calculations to improve results
—There are undoubtedly other issues, yet let’s address these first.
Using MTBF to specify reliability
Stop it. MTBF is only an average failure rate and without a duration has little meaning. Even with a duration it isn’t very informative and doesn’t help us to address the requirements for an early life or mission duration, warranty or deployment duration, or useful or expected lifetime duration. We can ignore early life failures and wear out failures as long as the overall average is fine. It also permits us to test assuming the exponential distribution, thus avoiding having to test long enough to reveal wear out mechanisms.
Using MTBF to measure reliability
Hopefully they are using other methods to estimate reliability other than just a parts count database. Either way, using MTBF masks the nature of the failures over time. MTBF is an average and there are many ways to achieve a specific MTBF value. Instead, use reliability and the underlying life distributions which detail the changing nature of the expected failure rates with time and/or stress.
Using 217 based parts count approach to estimate field reliability
Mil Hdbk 217F and similar parts count methods are not tools to predict field performance. The descriptions and list of objectives for such a study clearly state so (in the various parts count tools I’ve encountered). The tool is to permit comparisons, to explore impact of design changes, etc. Specifically not to compare to performance requirements or to estimate actual field failure rates.
It is the wrong tool, so stop using it as a means to estimate reliability.
Attempting to adjust or tweak parts count calculations to improve results
Having played with and used different parts count methods over the years I’ve found that there are plenty of ways to adjust and modify the results. Derating factors, quality factors, temperature assumptions, etc. The list is endless with some tools. In theory we have all the information and can includes the various modifiers to improve the ability to the prediction to reveal weaknesses and the impact of changes.
Using fewer parts or running electronics at a cooler temperature in general will improve system reliability. That is one very good thing about parts count predictions, it encourages changes that are actually good for the reliable perform of the system. Using the result as an estimate of future performance has been shown over and over again to have no merit.
Fiddling with settings and assumptions is just repugnant and most likely ethically wrong according to any engineering code of ethics. We are not in the business of adjusting calculations to get the desired results, we use the available tools to best of our ability and make decisions accordingly. If the results are not good enough (despite not relevant or useful as in this case) we should not adjust the assumptions and setting to get the desired result.
That is just wrong.
Instead focus on creating a reliable design that meets the customer expectations. If the requirement is MTBF ask what they really want in terms of reliability. Use physics of failure and life testing techniques to estimate future performance. Provide the customer with useful information and if absolutely required include the MTBF with how that came about from the actually useful information.
WILLIAM THORLAY says
From maintenance point of view, we deal with equipment rather than components or parts isolatedly. For that reason, most of the data related to failures that we get, refer to the equipment, sometimes complex, like a rolling mill, sometimes simple, like an induction motor or a pump. So, they are repairable systems. Even if I can separate the data by failure modes, it is still difficult to make any kind of prediction. For this reason, I prefer to analyse the criticality of each equipment in order to determine failure consequences and take proactive measures, like CBM or PM through RCM tool. Having said that, do you think that all the statistics tools we’ve learned to understand reliability are useless to make predictions? Would you recommend the use of RCFA instead of analysis of the failure data, and put more focus in availability and reliability by understanding the physics of failures?
Fred Schenkelberg says
Hi William,
Thanks for the comment. Regression is regression and useful for modeling time to failure data, or built into physics of failure models, yet it has the real limitation concerning extrapolation. Use with caution.
An accurate statistical model is more likely to predict the future performance than a poor model, yet so few take the time to use the data to create and verify an adequate model.
In the maintenance world, you may benefit by using mean cumulative function and associated plotting. See Wayne Nelson’s work on the topic.
http://nomtbf.com/2012/02/graphical-analysis-of-repair-data/
Cheers,
Fred
Jim Borggren says
I agree with a great deal of what you say about the use and misuse of reliability parameters such as MTBF, and also agree that an approach based on physics of failure techniques would be preferable.
However, in the world we live in most text books describe methods to apply MTBF. Also, system reliability analysis using RBDs for FTs virtually demands it. We also have reams of data available (OREDA, NPRDS, etc.) which provide exponential failure rates for plugging into these models.
Where is the comparable information on how to apply physics of failure analysis, and the numbers to use in the analysis, so that we can generate the numbers that are required for reliability and safety analysis?
Fred Schenkelberg says
Hi Jim,
Thanks for the comment and request.
RBD or reliability growth do not demand MTBF or simple static failure rates – that is a simplification and very crude approximation. You can and should use life distributions appropriate to the component failure mechanisms in those models.
Yes there is a lot of complied data — too bad the data isn’t useful. If they would publish the Weibull or appropriate distribution of the complied data, and they most likely have the time to failure information to make that calculations we all would be a bit better off. It makes no sense to me to list a failure rate for a pump, when we know they wear out. The failure rate changes with time, let’s use that information instead. Demand from your sources better data.
And, where do we get better data. Our own field data analysis, our own modeling and experimentation (start with a literature search on the failure mechanisms where you can find really good work and models), check out the RIAC WARP site, where they list papers and PoF models. For novel or new items, do the characterization work to build your own model.
Reliability engineering is not about looking up inadequate tables of numbers and turning the crank – it’s about enabling our teams to make decisions as they attempt to create reliable products. Do the research, learn the math, conduct the testing, get accurate estimates.
VITA 51 is a set of standards that lists a bunch of PoF models and considerations to apply to your situation. University of Maryland CALCE and DFR Solutions both have many folks that publish PoF papers and both have built software to analyze circuit assemblies. The New Weibull Handbook and current editions of Practical Realibility, Applied Reliability and others have many models for specific failure mechanisms. Plus they outline how to conduct your own analysis and testing.
The big databases of failure rates are dead. They are less then useless to provide a meaningful estimate of a products reliability. Sure, it takes some work, research, and effort on your part, yet it will provide meaningful and valuable information to you program.
Cheers,
Fred