What do we know with MTBF
How many times have you been given only MTBF, a single value? The data sheet or sales representative or website provides only MTBF and nothing more. We see it all the time, right? It is provided as the total answer to “what is the reliability performance expectation?”
So, given MTBF what do we really know about reliability?
As you may suspect, not much.
Assumed Knowledge
Most of time we will assume it is the expected value or mean of the exponential distribution imposing a constant hazard rate, which we also generally know is not true. We know that MTBF is the inverse of failure rate, thus may try a few calculations on expected failure over some period of time.
We also know that our estimates and calculations often are inconsistent with actual reliability performance. So, we really do not know very much that is useful
Basis for MTBF reporting
In at least one case, the specific MTBF value was listed on data sheets only because the same number also appeared on competitor data sheets. Pure fiction.
In some cases the value is based on limited testing of units that did not show any failures. Say 1,000 capacitors run for 100 hours without any failures. That would imply at least a 100,000 hour MTTF (often reported as an MTBF – just to continue to add to confusion – not that it matters much). In such testing we do not know anything about the behavior beyond 100 hours.
At 50,000 hours of operation (a little over 5 years of full time use) the expected probability of survival (reliability) based on the assumed exponential distribution is R(50k) = exp [ – 50k / 100k ] = 0.605 or about 61% reliable. Which we know from experience is generally not true.
In some cases, the value is derived from a database of part MTBF values using a parts count prediction with unstated assumptions, derating factors, quality factors, or other modeling parameters. Keep in mind that these methods are not intended to predict reliability performance in any realistic manner.
In some cases, the MTBF value is determined based on extensive reliability testing. The testing may include testing to failure and fitting to a Weibull distribution. Instead of report the Weibull parameters for a fan which wears out, the marketing folks calculate the MTBF of the distribution and report just the MTBF. The MTBF value is larger and impressive and higher number are better as opposed to the informative and useful Weibull distribution results.
In some cases the MTBF value is calculated based on reported field issues. This may relative accurate for the expected field performance if there isn’t any early life or wear out failure mechanisms.
These last two are most painful, as the information we really need is there and then ‘dumbed down’ for reporting. Using inadequate or meaningless methods isn’t much better, yet given just MTBF how do know the source and what could be available?
We don’t.
We really do not know anything useful when given MTBF. Realize this is the case and go find or create better information.
Dave Robson says
I try to describe the ‘traditional’ MTBF as point in space with an underlying exponential distribution, without any confidence bounds. Many people still don’t get it.
Fred Schenkelberg says
Hi Dave, unfortunately the lack of statistical knowledge among otherwise well educated folks is troublesome. cheers, Fred
Paul Franklin says
Very often, MTBF is the result of some calculation that uses data from an unknown source and with unknown quality. Nothing is usually known about *how* the data was collected or the methods used to analyze it. Nothing is usually known about the failure analysis that the reliability engineer did (or even if there was any).
If there’s field data, even from your company, many of these same problems remain, and they are compounded by the fact that the population of product in use will be smaller than the global population, and any one company may not see all the relevant failure modes.
And while a constant failure rate model works on a large population, it very often doesn’t work for a local division of a company. The analogy I like to use is that a tire manufacturer knows that the average life of a tire is 60,000 miles and that means that it needs to produce 4132 tires per day. That is of little use to me, since I have only 4 tires on a car and I will buy more when the tread has worn enough (whether or not that happens at 60,000 miles). And it is of small comfort to me if my tire is damaged beyond repair by a pothole.
My point is this: the numbers are useful guide, but they are only that. Doing real modeling and understanding how you estimate the parameters you use in those models is far more important. A reliability engineer’s objectives are (1) to do what’s possible to prevent failures, and (2) to plan what happens when things go wrong.
Fred Schenkelberg says
Hi Paul, thanks for the comment – as you suspect I agree that MTBF isn’t very useful. I would even argue that with large populations we can and should use accurate time to failure distribution and/or detailed understanding of failure mechanisms to make decisions.
I once heard the request, “I don’t care what reliability you claim, as long as you only sell me the one that works.” The metrics we use helps us to set goals, track development and field performance and most importantly, make decisions. Using the best metrics (most accurate and easy to understand, imho) only helps us create reliable products and systems.
Cheers,
Fred
Don (PLC Training) says
So true. Data is often is deceptive when not put into the larger perspective.
Example: 80% surveyed say my burgers are the best in the world. (larger perspective… 10 of my family members where the only people surveyed. LOL)
Fred Schenkelberg says
Hi Don,
Thanks for the comment and example – and, you could increase the same size for burger tasting…. I’m up for that 😉 — just to help your stats, of course.
Cheers,
Fred