Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by nomtbf Leave a Comment

When Do Failures Count?

When Do Failures Count?

14586657179_3359d879f8_m_dWhen Do Failures Count?

One technique to calculate a product’s MTBF is to count the number of failures and divide into the tally of operating time.

You already know, kind reader, that using MTBF has its own perils, yet it is done. We do not have to look very far to see someone estimating or calculating MTBF, as if it was a useful representation of reliability… alas, I digress.

Counting failures would appear to be an easy task. It apparently is not.

What is a Failure that Should be Counted?

This may not seem like a fair question.

Keep in mind that not all failures have the same consequences. Some cause serious problems for all involved, while other failures may never be noticed.

A product has many levels of specification and requirements. There may be layers of tolerances. Not every product is identical.

Not every failure is of interest. Not every failure is of interest to each customer. A failure that causes a product return or complaint by one person may not cause any response from others. Is a paint blemish a failure? Do you count it as a failure?

You might. Or, might not.

For every product, for every situation you are assessing failures, take the time to clearly define what is a failure. Define a shape dividing line between what you call a failure and do not call a failure. I would err on the side of calling more things failures, such that you capture anything a customer defines a failure as a failure worth counting.

When do we Start Counting Failures?

We’re talking reliability here, so do your out of box or first start up failures count as quality or as reliability failures?

If the product fails in the factory, is that a reliability failure. Often we track yield and do not count these as reliability issues, yet we do know there is a link.

If an early prototype fails, which is actually quite common, is that a reliability failure? It’s too easy to dismiss these failures as just part of the development process.

When counting failures do you include prototypes, manufacturing, early life, and beyond? Or just some subset? When not counting all failures is there a clear and well understood reason and definition of what is and is not countable?

I suggest count all failures right from the first prototype. Track all failures. Monitor, measure, analyze and then select where to make improvements. Dismissing or avoiding some failures limits your ability to understand where to focus your reliability improvements.

Which Failures Do We Include in the Count?

Recently heard a client discuss they were not going to track (count) software failures. Instead, they were going to focus on hardware failures only. This is troubling?

By not tracking all failures, you skew the information you can glean from your failure tracking. Avoiding one class of failure limits your ability to determine if you are actually focusing your efforts on solving the right problems.

If you do not count a failure reported by a customer, yet unable to replicate the issue in house, you limit your visibility into what’s happening from your customer’s point of view.

If you do not count a failure because we’ve seen it before, you limit your ability to prioritize based on the relative frequency of failures.

If you do not count a failure because you are unable (or unwilling) to determine the root cause, I suspect you are not willing to learn from product failures.

My advice is to count all failures.

Better is to track time to failure for all failures, plus conduct detailed root cause analysis of each failure.

When do failures count? Always, every failure counts as each failure contains information that permits you to make reliability improvements for your customers. If you do not count some failures, you will always overestimate your MTBF.

What’s your take on what counts? Leave a comment and let me know how you justify limiting what is considered a ‘countable’ failure.

Filed Under: Uncategorized

« Fixing Audit Findings
Are You Struggling with Reliability & Maintenance? »

Comments

  1. Dave Hartman says

    December 15, 2016 at 3:09 PM

    Hello?
    Testing….
    Is this thing on?

    Reply
  2. Fred Schenkelberg says

    December 15, 2016 at 3:10 PM

    Got it Dave, thanks for the help troubleshooting the firewall issue and getting the ability to comment working again. cheers, Fred

    Reply
  3. Dave Hartman says

    December 15, 2016 at 3:25 PM

    I work primarily in aerospace and defense, so every failure counts. Development failures, like HALT, and qualification testing usually point to specification and design errors. Production HASS test failures reveal build quality and some design errors. Field failures often point to specification and operator induced failures, but frequently some design weakness that escaped the previous torture testing. If the customer wants me to track the field MTBF I will require them to supply documented operating time, storage time, operating conditions like temperature, temperature cycling, humidity, shock, vibration, fault symptoms, error codes, and other information with each returned unit. They usually back down.

    Reply
    • Fred Schenkelberg says

      December 15, 2016 at 3:29 PM

      Great comment Dave – I like the listing to typical suspects to examine when a failure occurs. cheers, Fred

      Reply
    • Dave Hartman says

      December 15, 2016 at 3:33 PM

      I almost forgot, they also need to supply the vehicle chassis/tail number, what other equipment was replaced, what O-Level I-Level and D-Level testing and fault verification was performed, if the replacement cleared the fault, maintenance procedures, and there’s probably more but those are the big ones. Usually they’re clueless.

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

[popup type="" link_text="Get Weekly Email Updates" link_class="button" ]

[/popup]

The Accendo Reliablity logo of a sun face in circuit

Please login to have full access.




Lost Password? Click here to have it emailed to you.

Not already a member? It's free and takes only a moment to create an account with your email only.

Join

Your membership brings you all these free resources:

  • Live, monthly reliability webinars & recordings
  • eBooks: Finding Value and Reliability Maturity
  • How To articles & insights
  • Podcasts & additional information within podcast show notes
  • Podcast suggestion box to send us a question or topic for a future episode
  • Course (some with a fee)
  • Largest reliability events calendar
  • Course on a range of topics - coming soon
  • Master reliability classes - coming soon
  • Basic tutorial articles - coming soon
  • With more in the works just for members
Speaking of Reliability podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Dare to Know podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Accendo Reliability Webinar Series podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • test
  • test
  • test
  • Your Most Important Business Equation
  • Your Suppliers Can Be a Risk to Your Project

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy