How to Explain the Perils of MTBF Use

#487929643 / gettyimages.com

With a little practice and being aware of the many perils when using MTBF, you can become adept at clear and concise lines of reason to help others at least try a better way.

A trivial objection is ‘our product is not repairable so we’re using MTTF’. The math to estimate MTBF and MTTF from data is the same, total hours divided by total failures, thus both are an estimate of the average. Therefore, most the arguments to switch away from MTBF equally apply to MTTF.

Misunderstanding 1

When someone suggests MTBF is a failure free period, try not to snort or laugh, that doesn’t help. Instead point out the MTBF calculation results in the inverse of the failure rate. So if using hours, it provides the average chance of failure each hour. Then using the exponential distribution reliability function you can quickly show how many are expected to survive (the rest failing) by the end of the so called failure free period – which is about 2/3rds of the items.

Misunderstanding 2

When someone suggests we only use MTBF because ‘we have always used MTBF’, I first ask if the metric and meaning are well understood. There seems enough people with misunderstanding 1 that may be the only reason needed to persuade someone to try another measure. If not, I ask ‘so, how many are expected to survive over the first year?’ This question usually surfaces one more other misunderstandings.

Misunderstanding 3

‘I use MTBF to set the warranty period or our maintenance strategy’. MTBF is the inverse of the average failure rate and devoid of any changing rate of failure information. Are we dealing with a decreasing or increasing failure rate? How do we know the failure rate is actually the same for each hour? The only strategy for maintenance planning, given only an MTBF value and the assumption of constant hazard rate, is to replace or repair upon failure. If the item actually does have an increasing chance of failure with time, then MTBF is not able to describe that increasing rate. Use Weibull or some other model.

Misunderstanding 4

‘All our competitors and customers use MTBF in our industry’. This may be my favorite. Generally it is possible to quickly show that using reliability directly (probability of success over a duration for a function in an environment) provides a clear metric, plus using Weibull or other suitable model to describe the changing rate of failure over time provides a competitive advantage. By using a clear reliability statement and an appropriate model even with your customers, you avoid other misunderstandings, plus help everyone make better decisions. Better decisions concerning reliability mean meeting your customer’s reliability performance expectations.

Misunderstanding 5

Once in a while someone objects to using anything other than MTBF as it is a very easy metric to calculate from time to failure data. It is also easy to conduct test planning, etc. as many guides and books include examples showing the math required. The line of reasoning around ‘why limit yourself to the computing tools of the 50’s’ generally doesn’t work. It maybe the hesitation is actually related to doing the math involved with Weibull Analysis or other approaches. Sure the formulas and algorithms for anything beyond the exponential distribution and chi-square table may seem daunting, yet the benefits far outweigh the need to study and practice just a little. The math will come back quickly (you are talking to college degreed folks most likely in an engineering or science field). Reinforce the need to avoid the other misunderstandings, plus the benefits around accurate models and decisions.

There are other misunderstandings and effective lines of reason to help someone move beyond using MTBF. What have you found useful for particular misunderstandings?