After a discussion with a client this morning, and their motor vendor’s reliability engineer asked for a reference for a sample size calculation formula I recommended, I had a short email exchange with said reliability engineer. In my note with the references, I included an aside with a link to this site. He liked the site and agreed that MTBF was often misunderstood and not useful. He asked if the store sold much in the way of NoMTBF logo’ed merchandise – it doesn’t btw. I thought about how and why this site and the store exist and that invited my passion on this topic once again.
We need to do something to further the eradication of MTBF.
First, I suspect the ASQ CRE body of knowledge is coming up for review. Here are some ideas on what we can do to ‘fix’ the BoK.
Create a petition and submit to ASQ Certification Board with recommendation to drop the mention and emphasis related to MTBF.
- Encourage those that understand the proper use of MTBF to be on the team that rewrites the BoK – i.e. stack the deck.
- Barring the above, establish a boycott of the CRE BoK and certification process – i.e. not send them renewal fees or sit for the exam until it is ‘useful’.
Next, let’s consider standards. They often govern how our peers and colleagues think about reliability. If the standard relies on the assumptions related to MTBF, it should change. I am on the IEC group related to durability and regular lobby the group to drop, minimize or modify any use of MTBF. It is a slow process. I’m sure you may either use or be on standards teams that also use MTBF (and shouldn’t). So, what can we do here? Here are some ideas.
Work from within the standards writing framework to rework the standards.
- We should list ‘offending’ standards and publicize the effort.
- We can again create petitions and send to standard’s bodies to encourage the wholesale rewriting of ‘offending’ standards.
Product data sheets often use MTBF or Life (with determination of ‘Life’ being the unbiased estimator for the single parameter of the exponential distribution). As anyone reading this far down in this blog knows, that is a signal to any reliability progressional to ask the vendor to fully explain what the mean and to ask for additional information, data, or justification of claimed values. How about we just insist they don’t use MTBF. Again, what can we do? This a much larger issue with literally thousands of authors of data sheets. Here are some ideas.
Hold up a few as examples of being a really bad practice.
- Start a letter writing (email) campaign to vendors with ‘offending’ data sheets to increase awareness of their faulty practice, and that we are watching, want improvement and are available to help with education materials (i.e. this site).
- Write to company CEO’s and expose the mis information their teams or listing on data sheets.
This is just a place to start, to actually do more than simply discuss the issue amongst ourselves. I’m certainly open to more ideas and areas to press for change. New congressional bill and eventual law, new regulatory requirements, lawsuits (class action?), Superbowl advertisement, … What are you ideas, which do you like, what can you help to make happen?
John Pagendarm says
I don’t agree with eradicating MTBF from the BOK. Too much history and to many people use it. I think that a short statement that it is not considered best practice (and why) is appropriate. – John
Fred Schenkelberg says
Hi John,
If only we all only used it properly (which would then nearly totally eradicate it) it would be fine.
History and so many using MTBF as the reason to continue to support and encourage using it.
MTBF is so often inappropriate and wrong that ‘history and common use’ is simply a ludicrous rationale. Just because we have many people and for a long time making decision using very poor representations of what is really happening with their products is IMHO not a very sound reason.
History – there was a time when doing the math related to reliably was made possible by assuming a constant failure rate – given the tools we have on our desktops today, doing multiple integration is as simple as adding a row of numbers. We are limited by our computation power for the vast majority of our reliability tasks. Don’t limit yourself with history.
Everyone does it – In this case, as a reliability professional we should only use mtbf when it is the right thing to do. I am very hard pressed to suggest even one situation where mtbf is meaningful or useful for decision making. Not when so many other tools and approaches and metrics are so much more meaningful and useful. Just because ‘everyone’ does it, doesn’t make it right or useful or correct.
As far as I can tell there are not rationale, justifiable, logical or other reason for anyone to ever use MTBF. We can and should do better.
Try to not use MTBF for just one week. Use probability of success and duration in it’s place. You most likely will discover the ability to understand and make better decisions.
By removing MTBF from the CRE BOK it signals and sets the expectation that we as a profession should not, and do not, use MTBF. Let’s move on and improve this profession – eradicating the use of MTBF is one important step.
John Pagendarm says
Fred,
At this moment I am reading a reliability assurance handbook and it iuses MTBF, as do 1000’s of standards, procedures, and other documents. They are not going away. I agree with your desire to eradicate MTBF. (The button is on the front of my cubical.) However, eliminating all reference to MTBF in the BOK will change nothing. Using each occurrence to tell why we don’t use it will allow us to watch it disappear, naturally.
There is exactly one GREAT reason to use MTBF. It is why I use it every day. My customer requires it.
The customer is ALWAYS the customer.
Fred Schenkelberg says
Hi John,
Besides removing it from the CRE BOK, we are actively working to remove it from standards. IEEE and IEC standards related to reliability are slowing being revised and dropping references or call out specifically to MTBF. I know as I’ve been on these teams for a few years.
Given the argument that it is in such and such standard (recall that many Mil Standards related to reliability are significantly out of date and technically not to be used. Many of the guidelines are likewise out of date and using techniques from the 60’s.
If we, each and every one of us, don’t actively work to stop using MTBF, it will persist.
The customer is the customer, and may require MTBF. They often, as you know, do not understand MTBF nor understand what it means or how to use it properly. It often does not accurately relate to the reliability performance they are seeking.
I suggest that you and anyone else with said ‘customers’ do good reliability engineering and not use MTBF. At the end of the day one can quickly calculate MTBF from other work, list duration over which it applies, and list any conditions or assumptions that apply when using MTBF.
If the customer really wants a product that work for the duration of a mission or some other period of time, then stating the probability of success over that duration is much more meaningful than MTBF.
Not using MTBF permits those creating products or systems to actually meet or exceed the reliability expectations of the customer. MTBF only analysis provides meaningless or inaccurate information, often thwarting the ability to meet the needs of said ‘customer’.
The customer is not always right. When they specify reliability ask them what they really mean. I have found that it is often not even close to what they ‘specified’.
History, everyone does it, it’s in all the standards, and customer requirements around MTBF is all nonsense and you know it. Do good reliability engineering, understand and model the data accurately. That most often, and IMHO always, precludes using MTBF in any manner.
You might be getting the idea that I will not accept a poor excuse to use MTBF. And I have yet to see the time to failure data that support the use a constant failure rate, and in cases which I can imagine would be appropriate it is much more clearly stated using probability of success and duration.
No MTBF,
Fred
Mark says
Fred,
As usual, you are spot on. Perhaps John’s more gradual approach is because of the immensity of this elephant. Hard to eat all at once.
In my own position as a reliability engineer at GE aviation, I have a customer that insists on the use of MTBF. Were I to force a real metric, I would be squashed and have accomplished nothing. As an ASQ certified Reliability engineer, if I were to stop recertifying until ASQ fixed the BoK, I would lose my certification and perhaps my job. Again, nothing accomplished. So I think the all or nothing approach may not be as effective as something more along the line of the approach you offer above. Question the customer on what they mean and what they really want. Educate them on how to specify what they want and if they insist on MTBF, give them what we know they want along with the MTBF they request. Eventually, we will win them over.
Mark
Fred Schenkelberg says
Hi Mark,
thanks for the comment – I agree that a direct assault is most likely not going to work, yet a set of determined professionals applying stead pressure will prevail.
The numbers, the money, and decision risks, etc all point to using something other than MTBF — reliability works well, imho.
cheers,
Fred
Chet Haibel says
Hi Fred:
I like the original idea of your site — devoted to the eradication of the misuse of MTBF. There is nothing wrong with MTBF per se, it is the misuse that is wrong.
One could just as easily say there is something wrong with the exponential distribution. Its mode is 0 time; the highest failure rate occurs at the beginning. Then its last few failures are so late, that the mean is stretched out to when over 63% of the failures have already occurred.
It is so spread out that its standard deviation is equal to its mean. Taking the sometimes useful idea that the mean is the central tendency and applying it to the exponential distribution shows that it isn’t a useful idea in this pathological case. The mean and standard deviations are just mathematical definitions that don’t serve our intuition for the exponential distribution.
Perhaps more importantly, the exponential distribution models a disappearing failure behavior, failures whose hazard rate is constant (whose failure rate is not constant at all). In the last several decades, electrical component failure rates have been driven down so much that very few reliability issues are random-in-time failures.
This is due to larger and larger scale integration, pick-and-place robotic handling, and surface mount processes that have been carefully engineered using designed experiments. Indeed infant mortality (early-life) and wear-out (end-of-life) failures dominate. So exponential distribution mathematics are simply inappropriate.
Chet Haibel
Fred Schenkelberg says
Hi Chet,
thanks for the comment and glad to know this blog is Sunday morning reading material for you. You are correct that there is nothing, per se, wrong with MTBF other than the mis use, mis understanding abound in our professional and beyond.
We do need to use appropriate measures that promote understanding, rather than something that is ‘easy’ to use and calculate.
cheers,
Fred
Victor Martins, CRE says
This is a subject dear to my heart, and therefore, I feel compelled to add to the discussion.
First, I will say that I have encountered many professionals who do not understand the correct meaning and proper use of MTBF. However, MTBF is a correct statistical term and it should not be eradicated. As a tool to the Reliability Engineer (RE), it has in fact many useful uses, among which, one that I find particularly useful is its use in the allocation of higher level system reliability requirement to its lower level subsystems/assemblies during the design process. Thus, instead of eradicating the term the RE should make the effort to educate those he/she interacts with when the term is being used, as to the proper use of this important parameter. I find it that it is incumbent upon the RE to help its audience understand the true meaning of MTBF and its real meaning and purpose when attempting to define equipment life and decisions are being made based on this parameter. I don’t think that the term MTBF or its inverse (Failure Rate) should be used to define product life, without an understanding of its probability of success and enveloping confidence level. Thus, my bottom line is do not eradicate MTBF, instead, educate their users on its proper use.
Victor Martins. CRE
Fred Schenkelberg says
Hi Victor,
I agree that MTBF is not bad in of itself. Yet, I do disagree that it is an important or useful measure.
For apportionment I prefer using reliability at a set of durations. For example, if the system expects 95% probability of survival over 5 years, then the five major subsystems require at least 99% reliability over 5 years. (a series system) The math is easy, it is easy to understand, and no convoluted conversions and exponential relationships or misunderstanding.
I find it much easier to work with customers and clients using terms and practices that do not lend themselves to misunderstanding. I do agree that whenever we see the use of MTBF we should ask about how it being interpreted and if it is proper for the circumstance or not.
Thanks for the comment.
cheers,
Fred