What is MTBF?
The acronym MTBF is commonly known in our field as Mean Time Between Failure.
It is also associated with repairable systems in most text books.
It is also denoted as the theta parameter for an exponential distribution.
It is referenced as a metric for reliability, too. Oh, and it is the inverse of the failure rate.
And, it is mis-understood and mis-used by many. I digress, as there is plenty already written on the perils of MTBF.
What is MTBF? And where and how should it be used, if at all?
According to the old Mil Std 721C (1991)
MEAN-TIME-BETWEEN-FAILURE (MTBF): A basic measure of reliability for repairable items: The mean number of life units during which all parts of the item perform within their specified limits, during a particular measurement interval under stated conditions.
The conventional method to estimate MTBF is to tally up the hours of operation of a set of equipment and divide by the number of failures. It is not the only way to calculate MTBF, yet it does provide a reasonable unbiased estimate.
We often assume a constant failure rate for work involving parts count predictions, system reliability estimates or calculations, or just to simplify the calculations. Often is is unnecessary and may lead to erroneous results.
While MTBF is the single parameter for an exponential distribution, which implies a constant failure rate, nearly all distributions also have a mean – which can be estimated and denoted as MBTF.
When only provided MTBF as the reliability measure for an item, with no other information, we then rely on the validity of the assumed exponential distribution. MTBF is not restricted to just this distribution, thus leads to the misuse of the measure.
MTBF is a basic measure
It is not a good measure or a useful measure when the failure rate changes with equipment use or time. The measure masks that changing failure rate and implies a simple average describes enough information to make good decisions.
The next time you run across an MTBF value, ask which distribution did it come from, ask
- for the supporting data and evidence that the average is sufficient to fully describe the time to failure behavior.
- over what time period is the MTBF valid (this one often confuses those with little knowledge about reliability engineering or how MTBF is calculated or means.)
- about the failure mechanisms and what is expected to fail and when.
How do you define MTBF? This of course is a loaded question and you should be prepared to support your definition.
Paul Franklin says
MTBF is *only* meaningful in the context of a constant failure rate model. Other distributions have a characteristic life (such as Weibull), which at some level can be compared to MTBF. But they just aren’t the same thing.
Weibull (and some other distributions) requires more than one parameter to characterize reliability as a function of time. I find it far less confusing and much more useful to use the language “failure intensity” instead of “failure rate” when the hazard function varies with time. Am I being pedantic? Perhaps, but all too often, people use the notion that MTBF = Failures/Time, with only giving lip service to idea that this measure often varies with time. If there are more (or fewer) failures in an interval than you expected from prior analysis, it is almost always a good idea to figure out why.
Fred Schenkelberg says
Hi Paul, I like the distinction and should start using it. Thanks for the comment. cheers, Fred
Mark Powell says
Fred,
For fun, more properly “MTBF” is an abbreviation. Acronyms must be pronounceable.
Mark Powell
Fred Schenkelberg says
Hi Mark, – lol, learn something every day. cheers, Fred
Hilaire Perera says
For people who are unable to establish a Failure/Time distribution to calculate reliability of their product, the easiest way to track Reliability is to use MTTF(MTBF) periodically. “Single Point” calculations are not suitable for warranty, spares allocation, etc. Should calculate the MTTF(MTBF) number at a Confidence Level.
Fred Schenkelberg says
Sure it’s easy, yet when would you ever not have time to failure information? As a minimum we have ship and return dates. cheers, Fred
Paul says
Sometimes there is an important distinction between failures and returns.
In consumer equipment, returns can happen for all kinds of reasons unrelated to failure, including upgrading of equipment, ending service, etc. It’s also possible that policy drives replacement, for instance when a company deploys a computing environment. And, the economics of repair and life cycle cost management may increase scrap rates, which also may include unfailed equipment.
Like everything else, the process that generates returns is important, and it is important to know what that process is, and use the data accordingly.
Fred Schenkelberg says
Hi Paul, good comment and advice. I agree that all returns are not failures (although many are when looking at the overall customer experience and costs to business).
I’m most poking at those that dismiss returns that do not have identifiable or repeatable failures – if the customer says it doesn’t work, it’s a failure.
I agree that if they are upgrading or returning over stock equipment, then that is not necessarily a reliability failure, rather a failure of some other process, not just part of doing business.
Cheers,
Fred
Bill Keeter says
Seems to me that Reliability Growth Plot parameters make more sense than MTBF. MTBF really doesn’t tell us anything about the reliability because it doesn’t account for whether failures occurred during the beginning, middle, end, or throughout the period in queston.
Fred Schenkelberg says
Thanks Bill, for growth evaluations, if you consider the individual failure mechanisms and their increasing or decreasing failure rates, then I agree. cheers, Fred