Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by nomtbf Leave a Comment

5 Ways Your Reliability Metrics are Fooling You

5 Ways Your Reliability Metrics are Fooling You

5 Ways Your Reliability Metrics are Fooling You

We measure results. We measure profit, shipments, and reliability.

The measures or metrics help us determine if we’re meeting out goals if something bad or good is happening, if we need to alter our course.

We rely on metrics to guide our business decisions.

Sometimes, our metrics obscure, confuse or distort the very signals we’re trying to comprehend.

Here are five metric based mistakes I’ve seen in various organizations. Being aware of the limitations or faults with these examples may help you improve the metrics you use on a day to day basis. I don’t always have a better option for your particular situation, yet using a metric that helps you make poor decisions, generally isn’t acceptable.

If you know of a better way to employ similar measures, please add your thoughts to the comments section below.

Pareto Charts

A wonderful arrangement of counts of failures (or whatever set of categories your tracking). Using the concept that 80% of the problems come from 20% of the causes, we use this arrangement of information to prioritize our focus.

Solve the big hitters, those that occur frequently.

Is this always the right approach?

I say no.

Sure the concept is sound, yet the practice often only counts and displays the reported symptoms, not the root cause. The Pareto principle works on causes, not on symptoms. Keep in mind that a failure to power on may have dozens of underly possible causes. Which are you going to solve? Strive to track causes not symptoms, which is good advice in general. Applied to Pareto charts takes more work, yet provides clearer direction for your improvement efforts.

Another issue with a Pareto chart is it doesn’t not include severity of the failure. A few cell phone battery fires may get your product pulled from stores and airplanes. Yet, 200,000 scratched cases on an otherwise functioning phone may divert resources to solve a problem that most likely doesn’t need immediate attention. Adding weighty to indicate severity or cost of failure may help the chart convey what is most urgent and important, not just what is most common.

12 Month Rolling Average Warranty Returns

Whether count of returns or cost of returns or both, the rolling average tends to smooth out the trends. I doubling of returns in the most recent month may go unnoticed when averaged with the previous 11 months of data.

If only interested in trends over the long term, a rolling average smooths the curve out, yet often obscures the cyclic and noisy bits you may need to know concerning warranty.

The other issue I have with this approach is the denominator is changing. As your product ramps in production, the average failure rate will appear to decrease when in fact it is increasing. Let’s say after three months some proportion of products fail, after very few failures the first two months of use. Using a rolling average which including each month’s addition units exposed, we may swamp the oldest’s units signal that they are getting older and starting failure more often.

As the initial produced unit start to fail at an increasing rate, the latest months production, often larger than the initial months, may result in the false signal that everything is getting better.

Another issue is it may take a year or more for a step change in failure rates to alter the trend. Let’s say in June the actual failures rate for that month doubled. Since we have 11 months that experience a lower rate, the average will increase slightly. If the new higher failure rate remains at play, then each month the average rate will continue to increase and not reflect the magnitude of the actual failure rate for a year.

When possible and it’s always possible do not use rolling averages.

MTBF by Month

I was going to list this first or last in this list, given my dislike for MTBF in general. Yet, I suspected placing it third in this list may surprise you a bit. I guess this list in not in any particular order.

Regular readers of this blog understand the many issues with MTBF. Yet, we all have seen and continue to see MTBF tracked in a variety of ways. My favorite example of a very poor measure is to track MTBF values as a month to month trend.

Of course using only MTBF basically erases any information about the rate of change of the failure rate. Tracking field failures we have the information to determine if older units are failing at a higher rate, or we have a significant early life failure issue. The averaging done to create MTBF values wipes all that critical information away.

It is possible to implement a change to reduce early life failures and then experience a reducing in MTBF. The slope of shifts on a Weibull plot from below one to above, thus pivoting and reflecting information that appears to reduce MTBF. The product is actually better reliability-wise, yet our tracking metric suggested our improvements actually degraded reliability performance.

SPC C-chart of Returns by Day

I’ve only seen this once. At first, I thought this would be a clever use of control charts to monitor field returns. A closer look suggests it is obscuring some essential information.

The returns received are from a range of different aged products. The count of failures includes early life and wear out failures. The count increase and decrease, even on a day to day basis may reflect prior shipment counts more than anything else. The data is also muddled by vagaries of customers collecting failures units in order to ship them together.

Compare your call center data of the time from issuance of a return material authorization (RMA) and when the unit is actually received. The one time I did that the long-tailed skewed delay simply complicated our estimates of how long the product was actually used.

Instead use the call center data if you need counts. The issue of changes in daily or monthly shipments will still need attention to make this a useful tracking method.

Returns this Month over Shipments this Month

Unfortunately, I’ve seen this approach used a few times. This is simply counting how many units got returned in a month over how many were shipped that same month.

It might be possible for a unit shipped early in the month to failure and be returned that same month, yet unlikely in most circumstances. The ratio of these two unrelated counts does nothing to inform your team about patterns or changes in field failure rates.

An increase or decrease in a monthly shipment tally may have a larger impact on the months results. A spike in returns may fall during a particularly high shipment month and blunt the spike in returns message.

This ratio does not provide even a poor estimate of field failure rates. What we want is the ratio of returns over those units at risk to fail. While a bit more difficult to keep track off than monthly returns and shipment counts, it does reflect field failures a tad better.

Of course, tracking failure rates properly still runs into the average issues mentioned above. Instead, let’s track failure rates by age of unit. How many failures are occurring in the first month after shipment or installation? How many failures occur over the warranty period?

Better would be to track changes to the life distribution. I like starting with our projected Weibull distribution (or appropriate life distribution for your product) on a cumulative distribution function (CDF) plot. Then as returns occur and using the age of the units when the fail, along with censored data, build the field failure Weibull curve in comparison to the expected curve.

These are just a few of the problems with metrics I’ve seen. Plus a few suggestions to do better. What’s you take on this? What kind of poor metrics have you seen? What is your go to best way to track field failures? Add you comments below.

Filed Under: Uncategorized Tagged With: meaures, metrics

« Managing Failures Before They Occur
RCM Leaders and Followers »

Comments

  1. Rakesh jha says

    February 4, 2017 at 7:56 PM

    Very good article and this reinforces that there is no single kpi to cover all. There is always need to understand what are our reliability targets to achieve and then set up kpi. Further along with tracking no, little bit analysis is required to understand the insight of story and this also depends on reliability journey and maturity level of an organisation . Great work and thanks for sharing …!!

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

[popup type="" link_text="Get Weekly Email Updates" link_class="button" ]

[/popup]

The Accendo Reliablity logo of a sun face in circuit

Please login to have full access.




Lost Password? Click here to have it emailed to you.

Not already a member? It's free and takes only a moment to create an account with your email only.

Join

Your membership brings you all these free resources:

  • Live, monthly reliability webinars & recordings
  • eBooks: Finding Value and Reliability Maturity
  • How To articles & insights
  • Podcasts & additional information within podcast show notes
  • Podcast suggestion box to send us a question or topic for a future episode
  • Course (some with a fee)
  • Largest reliability events calendar
  • Course on a range of topics - coming soon
  • Master reliability classes - coming soon
  • Basic tutorial articles - coming soon
  • With more in the works just for members
Speaking of Reliability podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Dare to Know podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Accendo Reliability Webinar Series podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • test
  • test
  • test
  • Your Most Important Business Equation
  • Your Suppliers Can Be a Risk to Your Project

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy