Illuminating MTBF’s Lack of Information

Here’s a simple illustration of how MTBF oversimplifies data concealing essential information.

By convention, we tend to use MTBF for repairable data. That is fine.

You may also be aware of my dislike for the use of MTBF, for many different reasons. If you find yourself suggesting your organization, customer, industry or whomever to stop using MTBF, you may want to use this simple example to illustrate the ‘value’ of MTBF.

Three Data Sets of Collected Time of Repair Data

Let’s say we have three assets on the shop floor that have been running for 1,000 hours each. Each has experienced 10 failures requiring repair. The repair time is typically less than an hour (keeping repair time short compared to run time to keep the analysis simple.)

Machine 1 experienced the failures at the following hours of operation:

112	615
198	692
301	820
425	907
509	989

The times are in hours since the equipment was installed. The first failures occurred at 112 hours after installation. The second occurred 198 hours after installation, and so on.

Machine 2 experienced the failures at the following hours of operation:

112	760
293	813
480	849
560	898
702	920

Machine 3 experienced the failures at the following hours of operation:

112	350
142	424
191	563
230	710
280	879

Given this data what would you typically do to glean a better understanding of your equipment?

Just reviewing the data, you can detect the differences between the three machines. Given the differences, you may adjust your maintenance program, or work to determine why the differences exist.

A Simple DotPlot View

One way to view the data is with a one-dimensional plot. The dot plot provides the location of each failure along the timeline. Here is machine 1’s dot plot:

mcf-1-dotplot

Plus the plots for machines 2 and 3:

mcf-2-dotplot

mcf-3-dotplot

This provides a little more visibility over the table of numbers. Machine 1 seems to have evenly spaced failures. Machine 2 has more failures as the equipment ages (like my car did when I was in high school). And Machine 3 seems to be running longer between failures as it ages.

These simple plots reduce the work necessary when just viewing a table of numbers. They further illustrate the differences in the datasets.

What If You Calculate MTBF?

Considering MTBF is so popular and widely used, you may feel compelled to calculate MTBF for these three examples.

It’s easy to do, each machine has run for 1,000 hours and enjoyed 10 failures, thus all three machines have 100 hour MTBF.

Machine 1 has 100 hour MTBF
Machine 2 has 100 hour MTBF
Machine 3 has 100 hour MTBF

The use of MTBF suggests there is no difference. We have reduced the information available for consideration. Using MTBF we would treat the three machines exactly the same.

The MTBF values limit the value of the data and preclude our ability to identify differences, take appropriate action, or understand what is happening.

I suggest that is not a good metric.

So, stop using it.

Please feel free to use this example, maybe change it to fit your industry or situation. Help those around you understand their data.

Please let me know of any examples you use to make the point, MTBF is not helping here. Let’s collect and post what helps us get the message across.

Comments

Gerald T says

October 12, 2016 at 10:05 AM

You might also have asked: “Why did they all fail after 112 hours?” !!

- Fred Schenkelberg says
  
  October 12, 2016 at 10:32 AM
  
  Good point, I didn’t get to creative in the first time to failure in my examples…. yet, I totally missed asking that question. Good eye. cheers, Fred
  
Felix says

October 23, 2016 at 8:22 PM

Since MTBF is intended for the constant failure rate period, one could argue that at least two of those data sets would not have warranted characterization by MTBF in the first place.

- Fred Schenkelberg says
  
  October 26, 2016 at 9:21 PM
  
  Sure one could do that… I suppose. Yet, without the plotting and potentially fitting the data to a curve (line) would we really be able to tell if it really was showing a constant failure rate?
  
  Keep in mind there really isn’t such a thing or period as a constant failure rate period. There are many types of failure mechanisms some with increasing or descreasing failure rates. The closest to a ‘flat part of curve’ we see, is when the changes are small enough to not matter much concerning the decisions the data is supporting. Assuming constant failure rate doesn’t change the actual failure rate. Also, keep in mind that it is rare that a system will remain with little change in failure rate for very long.
  
  Cheers,
  
  Fred
  PS: been meaning to reply to this comment for a few days – pending the site recovering from an attack which took us offline for 3 days.
  
Jessie says

April 10, 2017 at 12:57 PM

Are these three machines all the same type/model?

- Fred Schenkelberg says
  
  April 10, 2017 at 4:49 PM
  
  They could be, yet these were just set up for examples. I have seen similar behavior when when different teams install equipment, one with faulty instructions. One group fails early, the other wears out nicely. Cheers, Fred

Illuminating MTBF’s Lack of Information

Three Data Sets of Collected Time of Repair Data

A Simple DotPlot View

What If You Calculate MTBF?

Comments

Leave a Reply Cancel reply