Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by nomtbf Leave a Comment

Popular Reliability Measures and Their Problems

Popular Reliability Measures and Their Problems

Popular Reliability Measures and Their Problems

 

14597433337_8392823f80_zMTBF

Mean time between failure or mean time before failure is very common. The common definition describes MTBF as a reliability measure that is calculated by tallying operating hours and dividing by the number of failures. Intuitively this is the average time until a failure occurs. Mathematically it is the inverse of the failure rate. Generally used for repairable systems.

Readers of NoMTBF know MTBF is commonly misunderstood as a failure free period or the normal distribution average, etc. Without knowing more about the dispersion of the time to failure data users of MTBF assume an exponential or homogenous Poisson process (constant failure rate behavior) which is rarely true in my experience.

MTBF stated alone provides at best a crude, misleading, and confusing statement about reliability. Without a duration and underlying distribution information I maintain it is less than useful for any application. Even with sufficient additional information, there are better reliability summary metrics available.

MTTF

Mean time to failure, like MTBF, is a common measure. Generally used for non-repairable items or a focus on just first failures. MTTF is a reliability metric calculated by tallying the operating hours and dividing by the number of failures. When the units are not repaired and placed back in service we have a measure that is a little different then MTBF.

As with MTBF all the same problems surround the MTTF measure. I do not recommend using MTBF (obviously), MTTF, nor any of the many variations (like MTBUR).

AFR

Annualized failure rate is the average failure rate over a year. The calculation includes either gathering the number of failures over a year, and dividing by then number of units that could have failed. Another approach is to collect failure data over a shorter or longer time period and adjust the results to appear as the average failure rate over a year. For example, collecting data for a month, then multiply the numerator and denominator by 12.

The basic problem is the resulting failure rate is an average, and without the underlying dispersion of the data it most likely obscures the information related to when to expect failures. If the arrivals of failures is not truly random with an equal chance of occurring each time unit, then AFR provides only a crude inaccurate measure.

Jon G. Elerath wrote and presented the paper, “AFR: problems of definition, calculation and measurement in a commercial environment,” at the Reliability and Maintainability Symposium, 2000. I agree with Jon, there is only a slight chance this measure will provide a useful summary of your reliability performance.

Warranty

When asked about the organizations reliability metrics, some said, “Warranty”. With a little exploration they either meant the cost of warranty claims per month, or other time period, or the duration of the offered warranty. The first provides the financial impact of field failures. The second implies a relationship between warranty policy and marketing set durations and product performance, which in general doesn’t exist.

A better warranty related metric is the cost of warranty per unit shipped. This provides a means to related the field failure rate, the cost to the organization (although only a fraction of the total cost) and the number of shipments. It converts the failure rate and warranty expense into similar units as the bill of material item costs. This allows us to compare the component cost of a part and the warranty cost of a failure for a unit. We then can make decisions on purchasing those more expensive reliable components and calculate the expected savings on warranty expenses.

One issue is the warranty type measures rely on failure rates and average warranty costs per failure, both being averages tend to smooth out and obscure the very information we need to make informed decisions.

Life

Grab one of the component data sheets and look for the element that describes the reliability claim. On some data sheets, more than I think makes sense, you may find something like, ’2,000 hour Life’ or ‘5 year Life’. Now what does that mean? (No pun intended.)

The underlying calculations, testing, or experiment or field data evidence can be from simply a guess, to an overly simplified average. For example, incandescent light bulbs may claim 2,000 hour life. William Meeker, author and professor, with his students, tested this claim. They found the bulbs did have an average operating life of 2,000 hours and the time to failure distribution was normal.

The 5 year life claim implied that the unit would last failure free for 5 years, or did it mean an average (unknown failure distribution) of 5 years, or was it a 5 year MTBF value? The sales folks liked the 5 years with little or no chance of failure. The evidence supporting the claim was a poorly done Mil Hdbk 217 based parts count prediction with component not found in the handbook excluded from the analysis.

When you see the ‘life’ metric you really should be asking a lot questions to find what is meant.

L10 Life

The American Bearing Manufacturers Association prefers L10 life (also called B10), which is the number of hours in service that 90% of bearings survive. It is also used in toxicology studies as the time till 10 of the 100 fish have died.  The L10 provides the 10th percentile of the unknown time to failure distribution. It provides either a single experimental result or a tabulated average.

L1, or the first percentile is similar, yet has the same issues. Neither provide information on the dispersion of the time to fail data. Thus L10 like MTBF, AFR, and other averaged based metrics tends to obscure the information we need to make rationale decisions.

I should mention that a plot of the L10 or most any metric on a monthly basis is commonly done to ‘spot’ trends. Using more of a bad metric doesn’t help make it useful.

Reliability

Why not just measure reliability directly? Assuming we understand or can find the documentation for the functions and environment, reliability includes the probability of successful operation over a duration. So, 98% of the inkjet printers in a home office will survive 2 years. That would work as a goal. We can then either conduct experiments or track field performance and (maybe using a Weibull distribution with the data) compare the results to the goal.

Of course, we can set metrics for the first month, say 99.9% reliable over first month of operation. Or, we can set and monitor reliability over the same duration as the warranty period. Reliability is just the probability a unit will survive the stated duration or it is percentage of units that survive the duration.

For repairable systems, like a car or plant machinery, we are interested in availability which combines reliability and time to repair.

Summary

Get beyond averages and use the probability and statistics you probably know you should know.

What metrics does your organization use and why? If you use MTBF because everyone else does, please take a look at how much money your organization could save by using reliability instead. Seriously, leave a comment and just state the metric you use.

Filed Under: Articles, NoMTBF

« What is the Link between Reliability and Brand?
Why Should a Supplier Work Harder For You? »

Comments

  1. Paul Franklin says

    October 28, 2015 at 9:31 AM

    Well said, Fred. Thanks for mentioning life metrics. L10, if I recall my history, came from studies of bearing failures, which are often (and sometimes successfully) modeled using the Weibull distribution. Engineers made the observation that once 10% of the bearings had failed, then the rest of them failed “rapidly,” and that knowing L50 (the median life) didn’t make much practical difference when it came to ordering spares and keeping equipment up and running. My recollection of the history is that engineering experience indicated that in this specific situation, it was possible to develop a useful rule of thumb. Unless you know that L10 is a good rule of thumb because there’s other evidence, then it’s just guess work. And as you point out, doing the wrong thing with greater precision and efficiency is still doing the wrong thing.

    Reply
    • Fred Schenkelberg says

      October 28, 2015 at 9:36 AM

      Thanks for the comment and kind words Paul. Like you, if my memory serves, didn’t W. Weibull’s study of bearings lead to the Weibull distribution? Interesting that lead to the use of L10 instead of the distribution, which would actually be more useful to describe the reliability performance of bearings over time. Thanks for the insight and background on L10.

      Cheers,

      Fred

      Reply
  2. Mick McWilliams says

    November 3, 2015 at 2:04 PM

    Thanks for a tidy explanation Fred, on what happens when people stop at the surface of data.

    We used MTBF for some components on entry into service due to their fresh design, however use of MTBF was specified as a guide only.
    In the future we will graduate to proper reliability modelling when we have the data, which raises another important issue: RECORD EVERYTHING!

    Reply
    • Fred Schenkelberg says

      November 3, 2015 at 4:22 PM

      totally agree Mick, thanks for the comment. cheers, Fred

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

[popup type="" link_text="Get Weekly Email Updates" link_class="button" ]

[/popup]

The Accendo Reliablity logo of a sun face in circuit

Please login to have full access.




Lost Password? Click here to have it emailed to you.

Not already a member? It's free and takes only a moment to create an account with your email only.

Join

Your membership brings you all these free resources:

  • Live, monthly reliability webinars & recordings
  • eBooks: Finding Value and Reliability Maturity
  • How To articles & insights
  • Podcasts & additional information within podcast show notes
  • Podcast suggestion box to send us a question or topic for a future episode
  • Course (some with a fee)
  • Largest reliability events calendar
  • Course on a range of topics - coming soon
  • Master reliability classes - coming soon
  • Basic tutorial articles - coming soon
  • With more in the works just for members
Speaking of Reliability podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Dare to Know podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Accendo Reliability Webinar Series podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • test
  • test
  • test
  • Your Most Important Business Equation
  • Your Suppliers Can Be a Risk to Your Project

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy