The MTBF Conspiracy Theory
When my son was young he asked a lot of questions that were difficult to answer. For example:
- Why is the sky blue?
- Why do I have to go to school?
- What is a conspiracy theory?
The first two were expected, yet the third set me back a little. How do you explain conspiracy theory to a 5th grader? The dictionary type definitions just seemed to confuse everyone. So, I made up a conspiracy theory.
I said, “Did you know, North Dakota, is not really a state?”
For those that haven’t heard of North Dakota, which on many maps is in the north central part of the US, that just reinforces the theory that it doesn’t exist.
My son, having recently memorized all fifty US states and their capital cities in school, said I was wrong and he even knew that was true as he still recalled the capital city name.
“Prove it.”, Was all I said in response.
“Well it’s on the map on the country as a state.” My reply included how maps change and are arbitrary. Anyone could have drawn the map, and how do we know it is accurate. Maybe the good folks in South Dakota paid the map maker to draw in the fictions state of North Dakota.
“It’s listed in Wikipedia!” And, my reply, was about how anyone can create a posting on the site, what is the proof it’s actually true? Have you ever seen a car with ND plates or meet someone from there?” He hadn’t.
My son knew I was only demonstrating the idea of a conspiracy theory. We had fun with it for years.
I was glad he never asked me,
“Why do people use MTBF?”
Just with the blue sky, a shrug and smile just wasn’t a good enough answer. There has to be a rational reasons people use MTBF.
After writing about perils of MTBF use for a few years, my current theory is it has to be a conspiracy.
The MTBF conspiracy theory revealed
Here’s what I think happened.
A bright engineer was tasked with estimating the reliability of a nuclear submarine’s electronics. He was given about a month to achieve this task, which is not enough time to conduct any testing. So, he gathered all the component failure rate data, tallied it up and reported the expected failure rate. {Parts count prediction}
The marketing department noticed the failure rate value and the word failure. The admission that the submarine might fail didn’t help to sell summaries, so they flipped the failure over, creating the average time between failure, or mean time between failures, MTBF.
The lower the failure rate the higher the MTBF went. Up was good. Failure is bad. {That’s how I think marketing folks think – sorry}
The engineers understood failure rates the math to create MTBF was pretty simple. So whatever, tis the same thing. Then management got involved.
The management team only wanted to read and talk about MTBF {again the word ‘failure ’ is bad thinking}. They set MTBF goals, they expected glowing reports of increasing MTBF values, and so on.
Then something really bad happened.
The US Military created a standard. And, a company used a computer to automate the standard’s estimate of MTBF. Other’s did too. Now there was profit to be made by estimating MTBF, not reliability. So, they sold MTBF estimations. After all, that is what the management team wants, MTBF.
The military standard spawned many industry standards. The standards become parts of purchase contracts. MTBF flourished.
“What is your MTBF?” became an acceptable way to ask about reliability performance.
The murky bit of the theory involves why very few stood up to say, “Let’s not use MTBF, it is not very useful. Let’s use the probability of success over a duration (reliability) instead.” You may have said these very words or words to the same affect. And you felt the resistance.
- We always use MTBF.
- Everyone in our industry uses MTBF.
- The vendor only provides MTBF values.
My theory is we all know better, {maybe not the marketing folks – sorry} and we just do feel able to overcome the resistance to change. We know we could do much better with better metrics, yet the backlash is unrelenting.
Just as that first engineer figured out a quick way to come up with a failure rate estimate, we too face the necessity to use MTBF. We do not have the time or energy to change our company or industry to stop using MTBF. So, we just do it.
It’s easy.
I don’t know if the spread of MTBF use is organized by a secret group or not. I suspect not. Yet the ease of use and avoidance of the word failure (or anything the smells like we would have to do statistics) conspired to trap us into using MTBF.
That’s my theory. If you know of any critical bits of information to support this theory, let me know. If we expose the conspiracy for what it is, it may just fade away. We then may get back to work doing reliability engineering and creating reliable products.
Louis Y. Ungar says
Conspiracy or not, MTBF is a clever way to hide failure as a fact of life. Instead of calculating the likelihood that something will fail soon, it tells people that it will not fail – at least not until MTBF is reached. From my perspective as a test engineer, MTBF is a way to downplay the importance of test. If there are 100 failures expected in a million hours we better have tests. On the other hand if we are lead the manufacturer to believe that no failure is likely to occur for 10,000 hours of operation, we can expect to be beyond the warranted period- so it becomes the buyer’s problem. Additionally, no one is in a hurry to create tests to support the product in the field because nothing bad will happen until the 10,000 hours are reached – presumably. Good article, Fred!
Fred Schenkelberg says
Thanks Louis, I may have enjoyed writing this one a little too much. Yet, really do wonder what keeps so many using MTBF.
I think you echo my feeling that in general talking about failure, or testing to failure, or expecting failure, all have an unwanted business and marketing connotation.
Hey, you are right, MTBF is a great way to ‘hide failure as a fact of life.”
well said.
Cheers,
Fred
Louis Y. Ungar says
As a follow up, when we suggest built-in self test using a few added components, we get the resistance that MTBF will increase as a result of the testing circuit. This of course makes no sense since the testing circuits do nothing until or unless the unit fails. This makes sense when we talk about failure rates, but it is totally obscured when you think in terms of MTBF.
Fred Schenkelberg says
Funny isn’t it – suggest something that actually is helpful to the product reliability and customer satisfaction, and if the precious MTBF changes in the wrong direction – DENIED.
Come on – we need to help our peers and managers understand reliability and throw off the yoke of MTBF.
Cheers,
Fred
Mark Powell says
Fred,
Don’t forget that use of MTBF was developed in the days before we had availability to do our jobs with computers. I suspect that played a role.
Most of the things you do with MTBF work with simple four function calculators and tables that can be interpolated with four function calculators.
Then inertia takes over, and when computers become available they get programmed with the quick calculator techniques instead of what we know to be of higher fidelity. (Amazing that a company can program into a computer a four function calculator job and make money on it.) I have seen this phenomenon happen in astronomy, physics, and other areas of engineering.
Mark Powell
Fred Schenkelberg says
Thanks for the comment Mark – I agree that the initial widespread use of MTBF was prior to the advent of computers and modern calculators. It was a simplification necessary to do the job.
No excuse today though. Hence my thought there must be an evil plot holding industry after industry in the throes of MTBF.
I once heard that washing machines are less reliable today then in the 1950’s – part of the reason is there’s not enough profit in selling only one machine to a family. Might be part of the conspiracy, no?
Peter Miskelly says
http://www.bbc.co.uk/news/world-us-canada-14142111 just learned something about North Dakota. All conspiracies fall over when some rational thought is applied. Conspiracy theories rely on joining the dots, making a chain of connections after the event that did not exist before it. They look to the past and may even seem reasonable at first glance. But things that seem obvious with hindsight were not obvious at the time. Lusser came up with serial reliability <1959. One of the problems you describe with MTBF is that it is an inverse function, like MPG is a measure of economy. If we flip it back and present it as failure rate it only solves that problem. Another problem is that the models used are not realistic. And what about serial reliability? To change direction you have to overcome inertia. Physics of failure is the way ahead but getting people to standardize and produce comparable results is just one of the obstacles. Conspiracy theories are a product of the way our brains work but they are never helpful. In this case if someone cannot see the tongue in your cheek it might even be counterproductive to your plans to overthrow the current set up.
I will keep my eye on this and see what happens.
Fred Schenkelberg says
Hi Peter, thanks for the link (didn’t know – my example was just made up. Will send the link to my kids to continue the discussion)
Also, yes, I did have a bit too much fun writing this post. And, I agree that overcoming the obstacles to fully embrace PoF will take some time.
Cheers,
Fred