Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by nomtbf Leave a Comment

When to use something other than MTBF

When to use something
other than MTBF

#114934697 / gettyimages.com

As you may suspect I would say you should never use MTBF.

Given MTBF is prevalent, we may find avoiding MTBF nearly impossible.

Given a choice

When talking about reliability goals, just use reliability. Say what you mean in clear language. For example, if you want 95% of units to survive without failure for 5 years, than say the reliability goal is 95% survive over 5 years (include function and environment if it’s not clear from the context)

When asking for reliability information, as for what you  want. If you want the device to last five years without failures or with very few failures, then just saying 5 years can be misunderstood. Couple the duration with the probability of survival, to be very clear.

When specifying a test, also be clear – the goal or objective is one statement, the confidence or statistical uncertainty elements is something different – keep them separate.

When not given a choice

When only given MTBF or only asked for MTBF values, what should you do. Well,  use the value and ask some questions. Remember that MTBF all by itself is just an indication of the average failure rate. It is not a duration and does not convey how long or over which period of time the failure rate applies.

I cringe when I hear someone comment on a 50,000 hour MTBF value with, “That is about 5 years, which is long enough for our application.” We really should state MTBF as hours per failure to be a bit clearer.

So, when given 50,000 hours MTBF for an item, I first consider over what duration this applies (if I don’t know – it’s time to ask more questions). So, let’s say we have a electronics box with a fan. It is expected to operate full time for two years, or 17,520 hours of operation.

If the fan assembly data sheet has a listed MTBF of 50k hours, and it’s the only information I have available. I can estimate the reliability directly.

$latex \displaystyle&s=4 \begin{array}{l}R\left( t \right)={{e}^{\frac{-t}{\theta }}}\\R\left( 17,520 \right)={{e}^{\frac{-17520}{50000}}}=0.70\end{array}$

This is the reliability function for the exponential distribution and results in an estimated 70% of units survive over 2 years. If that is an acceptable failure rate (about 30%) then use the fan, if not, find a better fan, or a better estimate of the reliability of the fan.

When only given MTBF do the math and convert the value into something that is much easier to understand.

Do the same when asked for MTBF. Provide reliability – probability of success over a specific duration. Again, make it clear.

Filed Under: Uncategorized

« Mood’s Median Test
Well thought out feedback »

Comments

  1. Paul says

    September 17, 2014 at 9:40 AM

    Fred, two points seem relevant. There’s no reason you have to assume that a constant failure rate applies. You might ask how your view of reliability would change if you knew that failures were distributed according to the normal or Weibull distributions. Since you know (in your example) that you’re dealing with a fan, you can deal with various assumptions about how wear-out might happen.

    If you have nothing else to go on but a statement concerning MTBF, you may also be in a position to recommend testing. If you’re lucky enough to have applicable test data in your filing cabinet, you might be able to reuse it in some way. One way that seems particularly appropriate is to estimate the conditional probability of failure based on age. Nelson has suggested some techniques.

    A supplier recently suggested that a card has an MTBF of millions hours, or hundreds of years. Probably the only way I can use that information is to estimate replacement rate and sparing needs for a large fleet (i.e., 1 replacement in X years if I own 100 of these cards). Fortunately, there was field data, and it turned out that we could estimate time to first failure, and knowing the age of a card, we could estimate the conditional probability of failure in any time interval we wanted. Of course it turns out that the median time to first failure of this card (and the 63rd percentile as well–which is the characteristic life for Weibull and MTBF for constant failure rate) can be estimated, and it’s on the order of high single digit years. It still means that this card is not likely to fail during the technology cycle, but we do have a data-driven approach here.

    There is every reason to believe that electronics wear out. Knowing this allows an engineer to work backwards from various sets of assumptions to get a feel for what might happen it real life. My general expectation is that over a reasonable time (say up to 5 years, more or less), that most problems will be fairly robust to the assumptions used, and this provides some structure for taking appropriate action. If the analysis shows that the design is sensitive to assumptions, then that’s useful too, and it tells you where to do more work.

    Reply
  2. Fred Schenkelberg says

    September 17, 2014 at 9:48 AM

    HI Paul,

    Great comments – thanks.

    If all you have is MTBF then you are correct and one should look for more information. Sometimes we have field data, maybe a literature search, etc. All good. The key is the MTBF in of itself is not all the useful. We should build the reflex to MTBF of needing more information.

    Even estimate spares for a fleet is not all that useful using MTBF alone. It may provide a gross number yet we may be very interested in when those spare or most needed. Like you said, electronics does tend to wear out – and there is the common issue of factory and supply chain induced issues (early life failures) – Given only MTBF we do not know if we need the parts early or late in the life of the units.

    The million hour MTBF – means not 100 years of life, it means there is a 1 / 10^6 chance of failure each hour – that is all. If they tested 1 million boards for an hour, they may claim they have test data to support the claim. More likely they tested a few boards for thousands of hours to tally up to one million total hours…. or they just did a parts count prediction….

    What is missing is what duration is the MTBF valid – if it’s just five years and ignores early life failures, then it means the units is probably pretty robust…. if the application is a 30 year solar panel installation – I would want more information.

    Cheers,

    Fred

    Reply
    • Paul says

      September 17, 2014 at 10:56 AM

      Fred,
      I wonder if MTBF can be meaningful unless the constant failure rate distribution applies. If you were integrate the hazard function and plot failures as a function of the integral, then you could tell. If that curve is concave upward, then infant mortality or reliability growth is indicated. If it is linear, then there is a constant failure rate. If it is concave downward, then wear-out is indicated.

      In real life, things aren’t usually so simple and it’s fairly rare that a single distribution captures the life cycle reliability experience. There are often competing failure modes.

      As you point out, a prediction is reasonably worthless. Testing 10,000 units for 100 hours isn’t the same as testing 1000 units for 1000 hours or 100 units for 10,000 hours. A reliability demonstration’s validity ends at the clock time of its conclusion. There would quite likely be different failure counts in each of those 3 test plans.

      Early failures can have lots of different causes: defects either introduced or not eliminated during manufacture, fast growing defects under field stress that cannot be detected during manufacture, installation errors, and so forth. The reliability engineer needs to be aware of whatever may apply in a given situation, and it’s reasonably unlikely that any measure (predicted or test-based) that is total time over number of failures will help control them. Deeper knowledge is simply required, or a willingness to live with those failures. Partnerships with quality engineers, procedure planners, and so forth are useful here. In any event, it’s more than just reliability.

      More and more, I’m tending to a data driven approach. I am challenging suppliers with questions like “what do you mean by an MTBF of a hundred thousand hours?” It cannot possibly have much to do with lifetime, or true probability of failure (given that I’m pretty sure right out of the gate that there’s a measurable return rate given even 1 or 2 years of operation). I’m also asking “how do you know?” I am asking what failure modes I can expect to see, and how “MTBF” is computed. If I get back total time over number of failures or a Telcordia prediction, then I know there’s a lot more work to do. If there’s no FMEA, then the supplier hasn’t thought about the product carefully. And I’m asking how the design is robust in the event expected failures occur. The real reason I’m interested in the rate of failures is that I want to understand how to plan for maintenance or replacement. If I can prevent failures by partnering with quality engineers, design engineers, and procedure planners, then so much the better. It isn’t possible to prevent 100% of failures, of course. In the age of networks and cloud computing, we have to be thinking about a lot more than dividing the number of failures into time. We have to be asking the right questions and getting useful answers (though this is often like pulling teeth with no anesthesia).

      Reply
  3. Fred Schenkelberg says

    September 17, 2014 at 11:02 AM

    Hi Paul,

    I’d say try it – if given just MTBF – then the hazard function is just a straight line – it will not show early life or wear out patterns….

    I totally agree that if you have the data, use it first. The simplifying assumptions involved with using MTBF often obscures the useful information contained within the data.

    Cheers,

    Fred

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

[popup type="" link_text="Get Weekly Email Updates" link_class="button" ]

[/popup]

The Accendo Reliablity logo of a sun face in circuit

Please login to have full access.




Lost Password? Click here to have it emailed to you.

Not already a member? It's free and takes only a moment to create an account with your email only.

Join

Your membership brings you all these free resources:

  • Live, monthly reliability webinars & recordings
  • eBooks: Finding Value and Reliability Maturity
  • How To articles & insights
  • Podcasts & additional information within podcast show notes
  • Podcast suggestion box to send us a question or topic for a future episode
  • Course (some with a fee)
  • Largest reliability events calendar
  • Course on a range of topics - coming soon
  • Master reliability classes - coming soon
  • Basic tutorial articles - coming soon
  • With more in the works just for members
Speaking of Reliability podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Dare to Know podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Accendo Reliability Webinar Series podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • test
  • test
  • test
  • Your Most Important Business Equation
  • Your Suppliers Can Be a Risk to Your Project

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy