Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by Fred Schenkelberg 2 Comments

Are You Tracking and Reporting Field Failures Well?

Are You Tracking and Reporting Field Failures Well?

Fielded products fail day by day. Customers report these failures generally seeking a way to remedy this issue. Gathering the reported or returned products or confirmed failures is common practice.

Depending on the product a simple replacement or exchange may suffice. For other products, repair or a refund may be appropriate.

In general, and not always, when a product fails in the hands of a customer, the organization designing, manufacturing and distributing the product learns of the failure.

A common practice is to count the number of returns per week or month. Counting as the items arrive. This tally per month is then easy to plot using a simple bar chart showing the count of returns per month over time.

The issue is that the number of units shipped change month to month the number of items that could possibly failure changes. The number of field failures could double even when the actual failure rate for products has not changed when we ship twice as many units.

A Very Simple Example

Let’s look at a very simple example.

If a new product is a 10% failure rate in the first month and no failures after the first month, and we ship 100 units. The first month we would receive 10 failed units back. If this occurs for the first three months of the year, we ship 100 units per month and we would receive 10 units back each month.

Now let’s say in April another customer orders an additional 100 units, thus we ship 200 items. Given the same failure rate, we would receive 20 units back. That effectively doubles the number of returns month over month. A 100% increase in field returns per month.

In this very simple example, it is obvious the number of units shipped doubled and the tracking failure rate would be an appropriate measure as we are interested in noticing a change in failure rate. Being able to identify such a change permits identification and resolution of the contributing factors causing the increase in failure rate. Or, the continuation of the causes of a lower failure rate.

Two things complicate this approach. Both the number of units produced and shipped vary, and the chance of a specific unit failing changes over time.

Shipments Vary

First, we often change the actual number of shipments per unit time.

While the forecast for shipments or sales may include nice round numbers per month, in reality, it is often quite variable. If the average shipments per month is planned to be 5,000 units the long term average may work out to be 5k units per month, yet the actual month shipments may vary.

The first month may be only 100 units, as production started just days before the end of the month. The next month, as the production capability ramped up the production line, they only could produce thus ship only 2,523 units. The third month in order to meet early demand the team works overtime and creates 6,467 units. And so on.

Variation in product capability, availability of necessary components and materials, holidays (production shut down), changes in customer demand, and many other elements change how many units are actually produced and shipped per month.

Failure Rates Vary

Even simple products have dozens if not hundreds or thousands of way it can fail.

Each failure mechanism has a finite probability of occurring any specific day. It’s a race to see which failure mechanism succeeds in causing a failure.

For a specific product that experienced an error during assembly, say a missing component for a specific function, let’s say it somehow shipped to a customer. It may fail immediately on first use, or it may lie dormant for months before that specific function is called into action and then exhibits the failure. Or, the missing part could lead to slow degradation of a function over many years only resulting in a reported failure many years after first use.

The same basic variability applies for each specific failure mechanism. A wear out mechanism may occur early with aggressive overuse, or only after an exceptionally long period of light infrequent use. Corrosion related failure mechanisms may occur quickly or not at all given the local humidity conditions.

In general, there is some pattern to specific failure mechanisms, yet they do exhibit variability of when failures occur.

Still a Simple Example

Let’s complicate the simple example described above. Instead of a fixed first-month failure rate of 10% let’s say it has the following number of returns given 100 units initially shipped.

Month Returns

Jan          1

Feb          5

Mar         4

At the end of three months, the total failure rate is 10%, yet the first month is was only 1%, then jumped to 6% the second month.

Now let’s imagine this organization ships 100 units in February and then again in March and each month’s production follows the same failure pattern. What would that look like over the first three months of production tracking cumulative shipments and returns per month?

Month Returns Shipments

Jan          1            100

Feb          7            200

Mar        17           300

Plotting the number of failures per month alone in not informative. Plotting the failure rate per month accounts for the number of units shipped, yet again is not very informative. The three months cumulative failure rate is 1%, 3.5%, and 5.6%.

The problem is customers after three months have a 10% chance of product failure, not 5.6%. Tracking cumulative failure rates using the cumulative number of returns and shipments under reports the failure rate in this case for customers that have the initial month’s units, as those units are now three months old. It may take many more months to recognize the underlying pattern of failures based on the age of the individual units.

Tracking and reporting based on the age of the unit is a better approach. Time to failure analysis of the data allows us to consider the probability of failure over time, just as the customer experiences the product.

The next article will describe a convenient way to track shipments and returns which allows the preservation of the time to failure information. How do you gather and report your field data?


Related:

When to Take Action on Field Failure Data (article)

Field Data Analysis First Look (article)

Failure Analysis: The Key to Learning from Failure (article)

Filed Under: Articles, Musings on Reliability and Maintenance Topics, on Product Reliability Tagged With: field data

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« Considering WIIFT When Reporting Reliability
Incorporating Reliability into Your Future »

Comments

  1. Hanh Nguyen says

    December 1, 2019 at 7:52 PM

    Thanks for your article. In this article, you stated the following:

    “Tracking and reporting based on the age of the unit is a better approach. Time to failure analysis of the data allows us to consider the probability of failure over time, just as the customer experiences the product.”

    Do you have an article about this approach?

    Reply
    • Fred Schenkelberg says

      December 2, 2019 at 9:22 AM

      Hi Hanh,

      Thanks for reading the article and your question.

      Yes, we do have many articles and podcast episodes on the topic. The overall concept of data analysis is broadly covered in these articles

      https://lucas-accendo-site-speed.sprod01.rmkr.net/?s=data+analysis

      and one specific approach, Weibull analysis, is covered in these articles:

      https://lucas-accendo-site-speed.sprod01.rmkr.net/?s=weibull+analysis

      the first result in the Weibull list is a plotting tool (you will need to be logged into the site and signed up to review the tool to view/use it).

      Cheers,

      Fred

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Article by Fred Schenkelberg
in the Musings series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • test
  • test
  • test
  • Your Most Important Business Equation
  • Your Suppliers Can Be a Risk to Your Project

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy