Life Data Analysis with Only 2 Failures

Here’s a common problem: You have been tasked with peering into the future to predict when the next failure will occur.

Predictions are tough.

One way to approach this problem is to analyze the history of failures of the most typical system. The issue looms larger when you have only two observed failures from the population of systems in question.

While you can fit a straight line to two failures and account for all the systems that operated without failure, it is not very satisfactory. It is at best a crude estimate.

Let’s not consider calculating MTBF. That would not provide useful information as regular readers already know. So what can you do given just two failures to create a meaningful estimate of future failures? Let’s explore a couple of options.

What Information Do We Have Available?

Well, two failures are a start. Of course, there are a number of questions about those two failures that may provide helpful answers.

When did they fail? How long did they operate? This provides just a sketch of time to failure information.

How did they fail? What is the failure mechanism(s)? Maybe there is a time-to-failure model that describes these failures.

The more we know about the two failures, the better we are able to estimate other failures in the population. Speaking of the population, how many elements are in it? Is there anything unique about the two failures compared to the remaining items? How about operating time for all items?

An Analysis Based on Similar Failures

If we know the failure mechanisms and time to failure information we may be able to use existing models or historical knowledge of similar failures to create an estimate of reliability performance. Some may call this a Bayesian approach. Use what you know both statistically and technically to your advantage.

An Analysis Based on a Published Model

Knowing the failure mechanism may permit finding a published model that describes the time to failure pattern. Knowing the time to failure information for the two failures allows using that information to adjust the model to fit the known information.

An Analysis Assuming a Beta Value

If the failure mechanism suggests a particular pattern of failures over time, say a wear out mechanism, we may be able to assume a beta value (for a Weibull distribution). Using the two known failures construct a rough estimate using a point and slope approach.

Condition Monitoring or Degradation-Based Approach

Another option, again understanding the nature of the failure mechanisms, along with access to existing unit, we may be able to map the progress toward failure in some fashion for other items. If we have two failed meters, for example, due to excessive brush wear, we could measure brush wear on a sample of remaining units to create a degradation model and estimate the remaining operating life for the population.

Lot’s of if’s here, yet it is an option is the situation fits.

The Least Useful Option — MTBF

Finally, one could, I’m not sure why, one could estimate the total time of operation for the population including the two items that have failed, and calculate MTBF. You would calculate a number that may be satisfying, yet, as you know, not very useful for any practical purpose.

Summary

The more you know about the two failures the better. Ask the questions before fielding your units. Before failures occur. As after failures occur you may not have the range of options available to estimate system reliability.

What have you learned from a couple of failures? How did you treat the information?

Comments

Mark Powell says

September 6, 2017 at 10:24 AM

Fred,

Usually, if you only have two failures you are talking about a part designed to have a high reliability. This of course means that you expect to have few failures, ever.

So when you asked the question “What Information Do We Have Available?”, you forgot to mention that you may have a ton of suspension or survivor data. There is a wealth of information that can really help define the failure distribution if you can use the survivor data.

I will refer back to the “What’s the Fuss” article on no-MTBF.com for how to best solve this problem that many face.

Mark Powell

- Fred Schenkelberg says
  
  September 6, 2017 at 5:00 PM
  
  Hi Mark,
  
  You are right I did not explicitly talk about the suspensions and how to treat them. I also failed to mention the comparison between what we expected to fail and what did fail as a comparison and feedback to our estimates. If the two failures occurred as expected then our design time estimates are supported (for now). If the failures were surprises (either too few too late, or too many too early, or a different failure mechanism) we learn something and may have to adjust our estimates or ability to gather information concerning failures.
  
  Cheers,
  
  Fred