Let’s talk about an appropriate MTBF Requirement Reaction

Every now and then we receive a customer request concerning reliability. If asked most customers would prefer no failures, low-cost of maintenance or ownership, and trouble-free long-term performance. And, many also realize that failures do occur. Thus a series of discussions occur to find the economically viable solution for both parties. Part of this discussion may include a poorly worded reliability requirement.

How you respond can help to improve the discussion and accelerate the finding of the right solution.

Example Requirement

Here’s a situation that I heard about recently.

I am looking for a correct way to explain my client that there is something wrong with it. Would appreciate your help on how do you approach the below problem.

There is a an electromechanical system which consists of motors, relays, switches, limit switches, huge mechanical structural parts etc…consider around 100+ equipments. The client has asked an MCBF (since it operates in cycle) of 1,000,000 for the whole system. How do you react to this requirement.

We might need a little more information to fully respond, yet in general how should we react to this requirement? What questions should you ask to fully understand the requirement?

Basic response

My initial response is around what do they really want? MCBF is not very informative. This may be due to not knowing much about the system. My initial questions would attempt to flush out the four elements of a full reliability definition.

Is the system a series of these parts? Is there any parallel (reliability-wise) elements?
How many elements make up a system?
What is the system supposed to do?
How many cycles per day are expected to occur?
Over what duration is this system expected to operate?
How few or many could occur?
What is the environment (temperature, humidity, rain, dust, power, etc.)
If able or expected to do corrective maintenance, how long will that take and at what expense?
Does this equipment have preventative maintenance requirements?
Does this equipment have degradation over time leading to failure that we can predict?
What are the dominate failure mechanisms for this equipment?
How does the failure rate change over time?

Understanding the Stated Requirement

Now, back to the specific comment in the request of 1,000,000 cycle MCBF.

Let’s make a few assumptions so we can some simple calculations.

Let’s say we expect 4 cycles of use per hour, 24 hours per day, 365 days a year for 10 years, that is 350,400 cycles. Of course this is complete guess on my part, yet we would need to know the number of cycles and over what duration to understand the request.

We can use the reliability function for the exponential distribution to calculate the expected probability of the system operating without failure over some time period (number of cycles). For a ten-year duration or 350,400 cycles, we find

$latex \displaystyle&s=4 R(t)={{e}^{-\left( \frac{t}{\theta } \right)}}$

$latex \displaystyle&s=4 R(350,400)={{e}^{-\left( \frac{350,400}{1,000,000} \right)}}=0.704$

Overall the requirement is for a system to operate with a 70% chance of having no failures over 10 years of use. Is this what they want or expect?

How about for just one year or 35,040 cycles?

$latex \displaystyle&s=4 R(35,040)={{e}^{-\left( \frac{35,040}{1,000,000} \right)}}=0.966$

That sounds better, yet is it what they expect, that the system as about a 97% chance of operating for a year without failure?

How about the specific equipment requirements?

We do not have enough detail the system yet we do know the system has over 100 set’s of equipment including motors, relay’s, etc. Let’s estimate how reliable a single motor has to be, if we want to achieve the overall system reliability of 1,000,000 MCBF, or a 97% reliability over one year.

Assuming the motors are in series reliability-wise we multiple the reliability of each motor to obtain the system reliability. Of course this is a simplification, since I’m only considering the motors and ignoring the remaining equipment. The same basic logic applies, the math and details are just slightly more complicated.

$latex \displaystyle&s=3 {{R}_{sys}}(t)={{R}_{1}}(t)\times {{R}_{2}}(t)\times \cdots \times {{R}_{100}}(t)={{\left( {{R}_{motor}}(t) \right)}^{100}}$

$latex \displaystyle&s=3 {{R}_{sys}}(35,040)={{\left( {{R}_{motor}}(35,040) \right)}^{100}}={{\left( {{e}^{-\left( \frac{35,040}{{{\theta }_{motor}}} \right)}} \right)}^{100}}$

Solving the motor MCBF we roughly find they have to have a value of 10⁸. Of course, the motors would have to be even more reliable as we did not consider the other elements of the system. At one year each motor should have a reliability of 99.96% probability of running without failure (actually much higher).

It may be possible to source motors with very high reliability, yet that isn’t the approach we should be taking given how motors work and fail.

Better questions to ask

If we stay with the assumption of a constant hazard rate (exponential distribution) for motors we are most likely going to make a serious miscalculation and/or not meet the system reliability expectations by design.

A better question to ask when working with the motors is on how the motors will fail. While in this case we do not know the specific of the motor, it may be fair to say the motor bearings will wear out over time. The chance of failure increases over time as the bearings decrease in their ability to reduce friction.

This suggests that while a motor may be able to meet the first year very high reliability requirements, over time the same motor will wear out and fail. By using the assumption of a constant chance of failure per cycle, we may be overestimate early failures and underestimating later failures.

As the motor vendors for failure mechanism information and Weibull distribution parameters (or an accurate description of failure probability over time (cycles).

Summary

In general the requirement of 1,000,000 MCBF is fairly meaningless unless we also have duration over which to determine the reliability. We certainly would need to understand the need for failure free operation, availability, ability to conduct preventative and corrective maintenance, along with the basic architecture related to reliability (parallel or series or mix).

For the specific elements we also should understand the failure mechanisms and time to failure patterns.

My first response stands, I would ask they what they really want and get a better sense of their expectations around system reliability.