Let’s say we have a product that most often fails for one major component. Let’s say a fan (it could be anything, and while I don’t have anything against fans, it’s easy to picture).
Ok, this fan has a data sheet with the classic reliability claim of 50,000 hours MTBF. For those that know about my disdain for MTBF (www.nomtbf.com) rest assured I’m not going to get into it here. The basic approach for estimating the number of failure during any period of time does require a few pieces of information. MTBF is common on data sheets, so, in this case, that’s where we start.
Without any other information about the life distribution and given only MTBF, we will have to use the exponential distribution. The cumulative distribution function is
$$ \large\displaystyle F\left( t \right)=1-{{e}^{-{}^{t}\!\!\diagup\!\!{}_{\theta }\;}}$$
where, F(t) is the probability of failure up till time, t. Theta, θ, is the MTBF.
The next piece of information we need is the warranty period or the period of time of interest. In this case, let’s say it’s three years. And, since the fan is the primary concern in this simple example, we can consider the duty cycle of the fan within the product. The sake of ease in this example, let’s say the fan in working full time (maybe a server product, for example). That means the fan will operate for 365 days x 24 hours x 3 years = 26,280 hours.
Now we’re ready to do the calculation.
t = 26, 280 hours
θ = 50,000 hours
Using the equation above, we find 0.41, or we would expect that about 41% of the fans would fail by three years. The time is related to the age of the individual units, not production time. In short, a lot would fail. How many?
We need how many units are shipped or expected to ship. Let’s say, we are assuming we will produce 10,250 of these products, how many will come back under warranty due to fan failure?
10,250 x 0.41 = 4202.5 or just over 4,000 fan failures.
Multiply the number of warranty failures by the cost of a warranty return to find a number of warranty reserves to set aside.
If you have any questions or would like to see other examples, please leave a comment.
Related:
Confidence Intervals for MTBF (article)
Using The Exponential Distribution Reliability Function (article)
Reliability Goal (article)
Hilaire Perera says
MTBF/MTTF as single point estimates are “risky”. Better to use Lower Confidence Limit of these numbers when calculating Reliability, Allocating Spares
Michael Li says
Hi,
By following formula, exp(-t/MTBF)=0.59, then 1 minus 0.59 equals 0.41. 0.41 would be the probability of failure. Is that true?
Regards,
Michael
Fred Schenkelberg says
Yes, Michael, that is true, the reliability function is as you describe it, and 1 – R(t) is the CDF which provides the probability of failure over the duration t. I forgot to subtract the reliability function (probability of success) from one. It’s updated now.
Michael Li says
This is a good article for helping me solving the relationship between MTBF and warranty.
KESAVA says
How would I calculate warranty cost for repairable products, if I have MTBF , missing time
Thanks,
Kesava.
Fred Schenkelberg says
Hi Kesava,
Pretty much the same way as in the article. If you have one piece of equipment, then skip the last part about how many units are running.
The hard part is with only MTBF you can only estimate the expected number of failures or the probability of failure over some duration. You need to be sure the MTBF value is valid over the time period of interest. IF the value is based on the first year of operation, it may not be accurate for the second year, and very inaccurate for the 10th year.
Another way to think of the problem is that MTBF is just the inverse of the failure rate. Given the failure rate per hour and how many hours you expect to run, calculate the number of expected failures.
You also need the cost of repair – or replacement.
If you really want to estimate the warranty of a repairable system – you really should understand the failure distributions for the repairable items and the overall system (reliability block diagram comes to mind here) and then estimate the costs based on which element of the system failures. A bit more complicated yet a whole lot more accurate.
Cheers,
Fred
Asfour says
Hi Fred,
at first thanks for the effort and followup, to answers others queries. My issue is related to devices warranty calculations, those devices vary from DDC controllers to various types of sensors, active and passive. one of the painful argument is how much spars cost should be considered during the warranty phase. which vary from 1-3 years.
MTBF for devices are known, but when i try to use available formulas and i tried a lot, the result is not logic. since actually this is not happening, and i mean by failure is device need to be changed/replaced not to be maintained. so can you help here?
Fred Schenkelberg says
Hi Asfour,
With actual field data, shipments and returns, better if you know the date of shipment or installation, and date of the return for specific serial numbers, you can sort out the time to failure distribution. I often start with Weibull and see how well that works. With that data, you have a representation of the actual rate of field failures and can estimate future failures as well.
Using MTBF or MTTF of components or any parts count type estimate of reliability rarely, and only by luck, going to represent the actual field reliability performance. Using field data and calculating MTTF or MTBF likewise will provide a crude estimate that does not include the changing nature of the failure rate as the item ages.
So, do not use MTBF. Use the field data you have.
Cheers,
Fred
Tom Nolan says
Looking to see which is the best way to calculate parts replaced and returned from the field. Currently using Predicted Annual Failure Rate (PAFR) is there any other method to do the calculation. I have had a request to do calculations on return rate do you know if it possible to do.
Failure Rate (PAFR) = the expected qty of returned parts to the OEM that are actually defective. This excludes NTFs. Again, expressed as a percentage of the component IB, annualised
Return Rate = the expected qty of parts returned from the field from Veritas’ service partner to our OEM, expressed as a percentage of the component IB, annualised
Fred Schenkelberg says
Hi Tom,
First off keep in mind that the annualized failure rate is an average and thus not informative on any changes to the rate of returns.
Second, always count NTFs – a very easy way to help improve the return rate is to classify more as NTF. Besides if you have NTF there is still something to solve else customers would not be returning them to you.
Third, better is to use the field return data directly to fit Weibull or appropriate distribution to the data – then use that information to predict returns each month going forward. Weibull++ has a handy tool to analyze and predict.
Forth, before shipping, you can use the development reliability block diagram and current reliability estimates to estimate warranty returns. You’ll need an estimate of weekly or monthly shipments as well.
Cheers,
Fred
Vijay says
Hi Fred,
Thanks for the example.
You arrived at 4202.5 failures based on CDF*number of fans.
What if we approach this from an expected number of failures view?
For a component having constant failure rate,the expected number of failures follows a poisson process with a mean of n*λ*t
Therefore , expected number of failures over time (26,280 hrs) = 10,250*1/50,000*26,280 = 5387.4 which is vastly different from 4202.5.
Which one is the correct methodology.
Thanks
Fred Schenkelberg says
Hi Vijay,
I do not think either is appropriate nor very good (accurate) as very few if anything follows a constant failure rate. Better to understand the driving failure mechanism and model the time to failure behavior.
Cheers,
Fred
William Thorlay says
Hi Fred,
Considering a duty cicle of 12 h/day, should I use only this 12 h and calculate F(t) in 3 years? If I am a maintenance engineer, should I take the downtime hours to calculate F(t) or assume that the down time is not representative and just use the period of time that I want to know this particular F(t).
Fred Schenkelberg says
Hi William, both good questions. Yes, adjust the time element to reflect the duty cycle and be clear about what 3 years represents – i.e. not 24/7 operation. For the maintenance example, downtime is fine, yet you most likely will want to know more than just an average. As with any set of data, adjust the analysis to help you learn or understand what is happening – the analysis should lead to better questions as you explore ways to make improvements or changes. cheers, Fred
Mark fiedeldey says
Fred,
I bet this was difficult for you to force yourself to write. MTBF is such a substandard metric. But thanks for the example.
Happy Easter,
Mark
Fred Schenkelberg says
Hi Mark, thanks for the note – many of my short tutorials are for those preparing for the ASQ CRE – yet, you know how I feel about using MTBF in any situation. cheers, Fred
Srinivas GS says
Hi Fred,
How can I predict failure rate and future warranty claims if I have field failures of returned products of 0 to 6 months. Assume sold qty 600nos per month . What will be the failure rate for 5th year of 60th month.
Months Failures Qty sold
0 3 600
1 11 600
2 17 600
3 23 600
4 23 600
5 21 600
6 3 600
Fred Schenkelberg says
Hi Srinivas,
seems you have consistent shipments or items sold. Having the number of units that have failed in the table do not seem to related to how old the unit is when it failed. of the 17 that failed in month two, where those from month zero or one or two? THis matters as what you need is time to failure information for each failure which allows you to also sort out the time to censored for those still operating. With the ‘time to’ data you’re ready for what we commonly call Weibull analysis (regressional analysis fitting a distribution to the data).
Enjoy the day (and the entire year) and best wishes.
Cheers,
Fred
Bernadeth De Belen says
Hi Sir, can you help me with this one. It is required to produce a device having a reliability of at least 95% over a period of 500hr. Estimate the maximum permissible failure rate and minimum MTBF
Fred Schenkelberg says
Hi Bernadeth,
Given minimum reliability of 95% or 0.95 and given that the probability of failure over the time period (500hs) is related to reliability as R(t) = 1 – F(t), we know over the 500 hrs you can have no more than 5% of items fail to achieve the 95% reliability.
Now, MTBF, first we really should not use it for many reasons. If the underlying time to failure distribution is well described by the exponential distribution, you can use the first formula in the article and simply solve for theta (which is MTBF, in this case). If not an exponential distribution, then you’ll need a bit more information than just a desired reliability and duration. Oh, F(t) here is the given 0.95 and t is 500 hours.
cheers,
Fred
Dustin says
Great article Fred. I stumbled upon this when looking for other examples of how to perform a warranty risk calculation. The way I did my calculations was not using the cumulative distribution function, but assuming a constant probability of failure over time. By running my calculation and yours for a 3 year warranty period, our numbers come out pretty close. I found that interesting. I think it makes sense to use the cumulative distribution as it assumes you would have a lower failure rate when the part is just installed, however I don’t think this accounts for infant mortality. For that reason I wonder if using a constant failure rate would be better? In any case, as I said our numbers actually came out pretty close when I totaled the cost of 200 different vehicle parts over a 3 year period.
Fred Schenkelberg says
Hi Dustin, thanks for the note/question and for reading through the article. Be certain that the distribution fitted to the data actually is appropriate. If there is a mix of distributions due to differing dominate failure mechanisms you may need two or more distribution to fit elements of the data.
Using a poorly fitted distribution or assuming it’s close enough to constant leads to under/over estimating reliability or failure rates at different over selected time periods. It also provides a false model of what is actually happening.
cheers,
Fred
Erik Johannes says
Fred, Thank you for your article. I have a system that is repairable. For each system component I know the MTBF. From your article I understand I can use the cumulative distribution function F(t)=1 – e^(-t/MTBF) to calculate the probability a component will fail before time t. My First Question: If I add all system component F(t) values for a given time t will the result be the probability of a failure of at least one component within the system before time t? My Second Question: If I add all the products of multiplying F(t) for a component by its Component Repair Cost will the result be an estimate of the repair cost at time t? – thx, Erik J
Fred Schenkelberg says
Hi Erik,
It’s easier to add the failure rates ( 1/MTBF) values, then convert back to MTBF for use in the formula you mention… you can add the lambda’s not MTBFs
Using the CDF you will get time to first failure, for any reason
Not sure about the component repair costs… best to run a simulation, which includes time and repair costs to get a better answer – a reliability block diagram approach may work well.
cheers,
Fred
Erik Johannes says
Thanks for your time.