MTBF: According to a Component Supplier
This one made me scratch my head and wonder. Did I read this right?
A reader sent me an except of a document found on Vicor’s site.
“Reliability is quantified as MTBF (Mean Time Between Failures) for repairable product and MTTF (Mean Time To Failure) for non-repairable product. A correct understanding of MTBF is important. A power supply with an MTBF of 40,000 hours does not mean that the power supply should last for an average of 40,000 hours. According to the theory behind the statistics of confidence intervals, the statistical average becomes the true average as the number of samples increase. An MTBF of 40,000 hours, or 1 year for 1 module, becomes 40,000/2 for two modules and 40,000/4 for four modules…”
source: http://www.vicorpower.com/documents/quality/Rel_MTBF.pdf
The except came with the following note and question
“In my opinion this is completely wrong but as I’m fledgling in this subject I’m sensitive to any statements like this.
Could you be so kind and help me a bit on it?”
A Quick Review of the Except
MTBF and MTTF are common metrics used (in my opinion) erroneously to quantify reliability. The starting sentences use common understanding definitions of MTBF for repairable systems and MTTF for non-repairable items.
It is very hard to argue with:
“A correct understanding of MTBF is important.”
As you know, I totally agree with this statement. Except the author then attempts to describe an understanding of MTBF based on a bit of statistical theory involving confidence intervals.
Let me paraphrase. If I have a component that has an MTBF of 40,000 hours, that does not mean the part will survive on average 40,000 hours. Again, agree with this statement if the average or mean is assuming a normal distribution. MTBF is the statistical mean (expected value) of an exponential distribution. If assuming a constant hazard rate (exponential distribution) then the statement is false, technically.
The very confusing part comes next. The author seems to invoke the central limit theorem or the law of large numbers, not sure, to suggest that with more units involved or produced (?) the MTBF declines. The example that with one sample, the MTBF is 40,000 hours, with two items the MTBF is divided by 2, with 3 items, divided by 3 and so on…
Following this logic, if I’m understanding the concept described. If I consider the reliability of 40,000 parts, the MTBF would be 40,000 / 40,000 or 1.
Huh?
Now I’m Curious
I read, and reread this paragraph. Then found the document online at the Vicor site and found the paragraph in context. I think I’m missing something here as I do not understand the meaning. Or it’s just not correct.
Reading the rest of document is akin to an game of hide and seek. I routinely found correct, not so correct and just plan wrong definitions, explanations, and descriptions of reliability concepts.
The MTBF interpretation is a new one for me. Thus, had to share it.
Vendor’s Quality Document for Customers
Given the document is listed under the quality section and provided as as support document, it gives me pause on how to interpret the quality and reliability data of their components. The parts may be well designed and function as expected, yet the reliability figures and claims, if created as well as they describe reliability concepts, may not provide useful reliability information about their components and systems.
We, as reliability professionals, rely on vendors to have fully characterized and assessed the reliability performance of their products. We often do not have sufficient resources to evaluate, model, test, etc every vendor’s claim.
It appears that we may have to do our own life testing going forward, especially when the vendor so poorly described even basic concepts concerning reliability engineering. I do not know the author of the document, nor anything about the vendor or their power modules. Yet, if I, like the person that sent me the except and question, relies on the information and reliability claims of this vendor, we may have one or more surprises concerning reliability performance.
When you see something that just doesn’t look right, or you are not sure about a statement or claim, ask questions. Check claims and descriptions against reputable resource. Ask for a second reading on a technical forum, and let a few of your peers consider the claims and maybe provide assurance or corrects (along with supporting references).
If you run across an egregious example of reliability engineering misunderstanding — let me know. I’d like to expose documents out there that are just not helping us or our peers understand reliability engineering.
Paul Franklin says
Oh my! The quoted explanation seems to be an argument in favor of using MTBF as a Poisson rate. That could, of course, be valid or useful, but there are assumptions that have to be tested.
Thanks for sharing this one. It’s a keeper.
Fred Schenkelberg says
Thanks Paul, yeah, seems to be a bit off in the explanations provided. cheers, Fred
Kevin Walker says
A classic – thanks! The good news is, ignorance can be fixed. The sad news – what are the chances it will be? Given the attempt at presenting a result, with no operating parameters around it, what would one bet that the 40,000 came from a handbook prediction?
While we’ve never used their products, they have a good reputation for reliability based on what I know of them.
Interesting
Fred Schenkelberg says
thanks for the note Kevin – agree with it is possible to correct ignorance. I don’t know anything about the products this company offers and make no judgement on their reliability performance. The white papers and quality documents offered to their customers and public are a bit concerning though.
Cheers,
Fred
Dan Burrows says
The last part on the statistics of confidence intervals is not what is really happening that the author didn’t explain correctly (in my opinion). I think what the author was trying to get at is how MTBF is often calculated by taking the total survival time divided by the number of units being tracked or tested. In this case, you can track or test a single unit for 40,000 hours and, if it survives failure free, declare a MTBF of 40,000 hours. You could also track or test 40,000 units for 1 hour and, if they all survive failure free, also declare a MTBF of 40,000 hours. Thus the point of this web site, No MTBF.
For me, my advocacy of No MTBF is actually No Misuse of MTBF. MTBF or MTTF represent the expected failure rate of an item over its expected useful life, but the “mean” terminology makes people think that this is mean life. I usually describe things in Failure rate over expected usage life and also provide the expected usage life so that people don’t think an item that will only last 10 years will live for 100 or 1,000 years.
Fred Schenkelberg says
HI Dan, Nicely put and yes you got it right, this mis-statement or mis-understanding is part of why the site NoMTBF exists. cheers, Fred
tim newman says
i have the same position. i dont mind MTBF per se. Its its mis-use that I have an issue with.