A Question and my Response on MTTF
So this corrosion engineers walks into NoMTBF and send me a message.
The Questions
Hi, I am corrosion engineer. May be you know for risk assessment of heat ex-changer tube bundle in API-581 , mean time to failure (MTTF) term is defined and used for risk assessment.
Would you please give me more information about MTTF and what history data required to calculate MTTF?
Thank u so much
My Response
Hi
First off MTTF and similar metrics are used for situations with a constant failure rate. Meaning that every hour a piece of equipment has the same chance to failure as any other hour, anytime.
This is generally not true and certainly not true for corrosion failures. When the right conditions exist, corrosion starts, grows and eventually over time leads to failures. The older the equipment the more likely it will fail due to corrosion, thus not a constant failure rate.
My advice is to avoid using MTTF or MTBF.
I would take a look at the models and data you have and use Weibull or other life data distribution to model the time to failure. From there you can convert to MTTF although it will not be meaningful during the first half of the lifetime generally by a wide margin.
I would ask the risk analysis folks what time frames they need failure rate information and provide estimates suitable for each time frame. An overall MTTF is pretty misleading and may alter the risk assessment results.
If you’d like to talk about better ways to work between reliability and risk assessments, let me know.
Just dawned on me I didn’t answer your question.
The data you need for the calculation is the total hours of operation of the equipment divided by the number of failures – pretty simple. So if you have 100 pumps, and all but one runs for 100 hours. The one fails at say 50 hours. Then the calculation is (( 99 x 100 ) + (1 x 50)) / 1 failure = for mttf of 9950 hours.
If there are no failure, still tally operating hours and divide by one (rather than zero when bad things mathematically occur).
Summary
Two things.
- Be sure what you are measuring and reporting using MTTF actually has a constant failure rate or close enough to constant that it doesn’t matter.
- Send over your questions and maybe become the next NoMTBF blog post.
Third and most important, if you have a NoMTBF button or mug or whatever, please take a picture of it in the wild and send over. Also send along a short note on how it has helped start conversations around MTBF.
I’ll create a page on the site showcasing the ways these devices are starting conversations.
Arash Karpour says
Hi Fred,
Great explanation. Thank you.
Would you please help with this example,
One plant, 10 components all belong to the plant and each component have a small list of at what hours of operation they have failed.
without using software could you point me to the right direction of how to calculate MTTF and the Reliability of the Plant? any small numeric example would shed lights.
using histogram …would it help for this calculation?
Thank you very much for your time.
Regards,
Arash
Fred Schenkelberg says
Hi Arash,
To estimate MTTF is pretty simple (not always useful – see http://www.nomtbf.com)
Tally up the time the individual units (the 10 components) have been operating and divide by the number of failures.
For the overall system, it’s the same, how long has the system been running divided by the number of failures.
For many systems you really want to know the availability, not MTTF or MTBF, especially is the total repair time is greater than about 1% of the total time. Then look for availability.
So if I have five units (one of the 10 components) that have failed and one that is still operating. Let’s say that all four failed at 25 hours of operation, and the one till running has been doing so fro 20 hours. The MTTF is 25 + 25 + 25 + 25 + 20 = 120 and there are four failures, so MTTF = 120 / 4 = 30 hours.
Cheers,
Fred
Arash says
Thank you Fred for the explenation. Helpful as always.
As you mentioned in the prevoius posts, to calculate the MTTF we assume a constant rate of failure. Otherwise I have to use Weibull. Doing so is there a way to estimate the Alpha and Beta for components?
For system’s reliability I use blockdiagrm (on how components are related in a system) to calculate the system’s reliability. Your opinion?
Cheers,
Arash
Fred Schenkelberg says
Hi Arash,
If you assume a constant rate of failure, then MTTF is the only parameter you need to estimate. Using the exponential distribution you can calculate the reliability at any point in time, R(t)=exp[-t/mttf]
Making the assumption of constant failure rate the beta term is 1.
If you have time to failure data, then fitting a Weibull to the data is one way to get the estimate of the beta and eta parameters. Another way is to use published or previous studies for the specific failure mechanism.
For a system with different components that each have different time to failure distributions, simply use the reliability function for each component and solve for the time of interest.
Cheers,
Fred
Arash says
Thank you Fred. I appreciate your time and support.
Arash says
Hi Fred,
For every Cumulative Distrubiution Function F(t) point on the Weibull graph there would be two associated UCL and LCL points on 95% confidence.
How do I calculated these two points for every modifed time to failure (Bernard’s approximation)?
Thank you
Fred Schenkelberg says
Hi Arash,
A good source for information not reliability statistics and calculations is weibull.com and reliasoft.com
http://www.weibull.com/hotwire/issue101/relbasics101.htm
for example has a discussion on calculating confidence bounds.
Also a good data analysis statistics book will cover this material.
Cheers,
Fred
collins says
Dear fred,i am glad i stumbled on this website, please can you help me with the formula or way to calculate Reliability that does not assume a constant failure rate. tanx
Fred Schenkelberg says
Hi Collins,
Reliability has four elements, function, environment, probability, and duration. If you have a product or system and time to failure information, you can determine the probability of surviving a duration by determining the percentage of units that survive the duration.
When working with a design or a prediction, then it gets a bit tricker and involved understanding the dominant failure mechanisms and failure models.
If you’d like to calculate the time to failure distribution and want to do it by hand, you may find the short tutorial, Calculating Lognormal Distribution Parameters of interest. Of course that only is useful if the distribution reasonable describes the pattern of your time to failure data.
Cheers,
Fred
Nnamdi says
Q5. Two nickel-cadmium batteries provide electrical power to operate a satellite transceiver. If both batteries are operating in parallel, they have an individual failure rate of 0.1 per year. If one fails, the other can operate the transceiver (at a reduced power output) However, the increased electrical demands will tripe the failure rate of remaining battery. Determine the system reliability at 1,2,3,4, and 5 year. What is the system MTTF?
Or
Q6. A pumping station has two identical pumps connected in parralle, each capable of supplying 3000 gallons/hr. the failure rate and repair rate of each is 0.5f/hr and 4r/hr respectively. Evaluate the frequency of encountering and duration of residing in each possible throughput state. I need help. With any of the two questions
Fred Schenkelberg says
First off do not calculate MTTF or use it for anything other than academic studies (even there use only in limited situations – it is not a useful calculation.
Second, for parallel systems see
https://lucas-accendo-site-speed.sprod01.rmkr.net/parallel-systems/
Third, for the two problems, here’s a similar problem and solution https://lucas-accendo-site-speed.sprod01.rmkr.net/two-pumps-problem/ that may provide some ideas on how to solve these.
Cheers,
Fred
steve says
Just wondering what you had in mind when you mentioned “if you’d like to talk about better ways to work between reliability and risk assessments, let me know” in your initial response.
Fred Schenkelberg says
Hi Steve, I am suggesting not using MTTF and instead gather and use the time to failure information directly. A Weibull distribution is good place to start. Understand the changing nature of the failure rate, that is a much better way to connect your work to reliability. cheers, Fred
Yonatan says
Hi Fred,
I was running a test (endurance test) where I’ve submerged 10 electrical units in a water tank and counted the hours until failure of each unit.
So, I have a list of 10 cumulative hour to failure. (of which 2 are still working…)
How would you recommend me to treat that data? I’d like eventually to have some sort of an index on ‘how well can that product withstand under water’ (in order to compare to other products) ? Adding to it , I’d like to have A Mean Time To Failure (notice: once the unit fails I do not fix it) or similar – until the probability of failure comes to let’s say 1% ?
I have JMP , if this helps.
Thank you in advance
Yonatan K.
Fred Schenkelberg says
I’m not very familiar with JMP yet you may be able to fit a Weibull distribution to the data including the two right censored points. I would not bother to calculate the MTTF or MTBF as it is really not all that useful. Why would someone want to know the average time to failure, instead of when the first percentile is expected to fail?
Cheers,
Fred
Dnes says
If a component has mttf of 12 weeks and the component is changed before 12 weeks, does this means failure will be avoided?
I know it wont be avoided but can i have a bit detail answer.
Thanks
Fred Schenkelberg says
Hi Dines,
No of course not, and I suspect you already knew this.
Let’s say the item really does have a constant hazard rate and the exponential distribution fits the failure pattern over time rather well… then the MTTF is the inverse of the chance of failure each hour. Think of it as rolling a die with many sides and if it turns up 1, we call it a failure. So, if the MTTF is 20,000 hours, that means each hour there is a 1 in 20,000 chance of failure.
Over 20k hours you would roll the die 20,000 times.
The math works out to there being about a 63 percent chance you will have a failure prior to 20k hours.
Now, the actual failure rate may be decreasing over time or increasing over time, so my example won’t apply exactly. We’d have to understand more than just the given of MTTF. But it would be likely there will be a failure sooner than you outlined.
In order for replacement to improve your availability (avoiding unscheduled failure) then, if the item has an increasing failure rate, and you replace the item when the chance of failure is deemed too high, you will minimize unwanted failures (not eliminate them entirely)
Hope that helps.
Cheers,
Fred
ahmed says
Hello Fred,
Can you please explain how did you come to 63% results?
BR.Ahmed
Fred Schenkelberg says
Hi Ahmed, 63% of items would be expected to fail when MTBF equals the duration. F(t) = 1- exp [ – t / theta ] so when t = theta, we have e to the negative one… or 0.36, with the one minus to estimate probability of failure at time t, we have 63%
Cheers,
Fred
Fred Schenkelberg says
Check out this weeks article on the topic you sparked.
http://nomtbf.com/2016/09/replace-mttf-time-avoid-failures-right/
Cheers,
Fred
shyam says
hi fred, I have a situation were I am trying to calculate MTTF for a single pump , were the pump is in 2X100 configuration. the pump s are operated by switching them every 6 months. So in 10 years each pump runs for 5 years . The total failure between the two pump is 6 . Whats my MTTF
My answer is : total uptime for one pump/ number of failure of that pump
5/3= 1.666.
is that right??
Fred Schenkelberg says
It is that easy, and it is also that not useful for any purpose that I can image.
Instead if you have the operating time till failure for those six failures, that may allow sorting out a rough estimate of a time to failure distribution. Of, if the pumps are repairable systems, then plot using a mean cumulative function – again looking for information on the changing failure rate over time.
Cheers,
Fred
rajat says
Sir, we are fresher and we have to calculate the reliability of a system. The system consist of 10 parts .And we have to find reliability of system.i have to develop a java program.So i am assuming that MTBF will be given to me and with that i am calculating the failure rate and then reliability.
So i am doing in a right way.
please help.?
Fred Schenkelberg says
Hi Rajat,
Basically, you are on the right track. You may or may not receive MTBF for the parts. Here’s a couple of article that may help.
https://lucas-accendo-site-speed.sprod01.rmkr.net/2012/02/07/series-system-2/
and
https://lucas-accendo-site-speed.sprod01.rmkr.net/2012/01/10/weakest-link/
and you need to consider how the 10 parts are organized reliability-wise. In series, where any one part failure means a system failure, or are some parts in parallel. If in parallel then it’s a slightly different model.
http://lucas-accendo-site-speed.sprod01.rmkr.net/parallel-systems/
of course, your system may be more complex and there are other models that can describe your system.
Cheers,
Fred
Nathaniel says
Hi Fred,
could you please help me out on this?
20 components with constant failure rate was observed after 25 hours of use and seven failed at the following hours 2.7, 8.7, 10.7, 15.7, 16.7, 20.7, 24.7 while 13 were still functioning. How do i calculate the mean time to failure.
Fred Schenkelberg says
Hi Nathaniel, if you did not replace any failed units simply sum the total operating time for both those that have failed and haven’t. Divide by the number of failures.
Of course that is a pretty meaningless figure you would have calculated. Instead look at the time to failure and censored data using a Weibull analysis. Much more informative.
cheers,
Fred
Ruslan says
Hi Fred,
Currently studying some reliability course, and couple of questions raised, maybe you can help:
1) When MTTF has the same numerical value than MTBF?
2) What is the difference between MTTR (Meant Time to Repair) and MPMT (Mean Preventive Maintenance Time)?
Thanks in Advance,
Ruslan
Fred Schenkelberg says
Hi Ruslan, just couple of notes here. 1. MTTF and MTBF are the same when the tally of hours and number of failures are the same. In generally, we tally just time to first failure for MTTF, and is repairing a system, count operating hours, thus that system may have more than one failure. The math is the same for both, thus there are a few ways the MTTF and MTBF of a repairable system can be the same. When the system(s) run with out failures. When the operating time does not include any time after a failure – system not repaired, et. and an infinite number of other situations when the non-repairable group runs longer than the repairable system such that the MTTF and MTBF are the same.
In short, MTTF and MTBF are rather meaningless measures and should be avoided.
from my upcoming book on terms in reliability engineering:
Mean Time To Repair (MTTR): A basic measure of maintainability: the sum of corrective maintenance times at any specific level of repair, divided by the total number of item failures during a particular interval under stated conditions.
Mean maintenance time: The measure of item maintainability taking into account maintenance policy: the sum of preventive and corrective maintenance times, divided by the sum of scheduled and unscheduled maintenance events, during a stated period of time.
MPMT uses just preventative time here, and I have found numerous differences for these type of definitions often depending on local custom and practice. For a specific maintenance tasks one calls it a corrective task while another team calls it a preventive, etc.
hope that helps. By the way, why are you taking a course that is asking you to know about MTBF? That seems silly. They should be teaching you something useful instead.
Cheers,
Fred
Hossein says
Hi Fred
We are working in oil drilling company, and we are running the MTTF, MTBF, MTTR for the first time. I want to know below items:
1) the target of above calculation is Reliability?
2)the amount of above items are numeric or diagram.
Regards
Fred Schenkelberg says
Hi Hossein,
1) no not really. Reliability is a function of time, R(t) – MTBF is the inverse of an average failure rate thus stripping out the time. MTTF and MTBF are just point estimates and do not convey the essential element of how the failure rate changes over time. If tracking times to failure do not use MTBF or MTTF as they reduce your ability to understand your equipment, system, or situation.
MTTR is another average and again not very helpful or informative.
2) I do not understand this question – MTBF and MTTF and MTTR are numbers, they are averages. They can be plotted, yet a plot of such numbers doesn’t have the ability to help you understand your system’s reliability.
Cheers,
Fred
Mahesh kodoth says
Hi Fred,
I was just reading through your posts and discussions which seems very interesting to me. Can I pick up your expertise for one of the problem which I am facing.
Basically, I have reviewed several published data sources to obtain failure data for safety valves. however since the data is coming from several different sources, I need to refine this initial data before I use them in my analysis. I am not sure if MTBF method can be applied here as I have directly fetched the failure rate rather then knowing operation time and number of failures. In such case, what is the best way to analyse your sample data. Is it okay to perform a gamma parameter a and b estimate from the failure data I already have and then estimate MTBF using gamma distribution model..? I am not so confident in my approach. So some suggestions would be really helpful to me.
Fred Schenkelberg says
Hi Mahesh, if you have time to failure data and have sufficient information for a life distribution (gamma, weibull, etc) that would be great. Much better than using a calculation of MTTF or MTBF… A difficult part of the analysis will be normalizing the data from different sources. Same conditions, definition of failure, etc. Good luck cheers, Fred
MPlacedes says
Good day Mr. Schenkelberg,
I am a student currently conducting a study on spares management. As part of the computation of the company’s maximum safety stock, we must determine its maximum demand, which can be based on its possible failure rates. And most machines increase their failure rates over time. Can you please shed some light on my issue? How can I compute for the spare’s probability of failure which can be used as basis for future demand? Thanks in advance.
Fred Schenkelberg says
Hi – good question and one that MTBF or MTTF values really do not help you solve.
Let’s say you’re considering stocking spare bearings and you know that bearings wear out over time thus have an increasing failure rate over time. The Weibull distribution provides a good description of the increasing failure rate. Great.
check out the article https://lucas-accendo-site-speed.sprod01.rmkr.net/2-parameter-weibull-distribution-7-formulas/ for the conditional survivor function (you can use the cumulative density function as well) to calculate the number of expected failures say each month or each year.
If you install 100 new bearings on Jan 1st 2018, you should expect no failures for the first 6 months, maybe 1 or 2 over the next 6 months, thus the first year the stocking level is very low. Now jump out 5 years, now of the orginal 100 we may need to replace 20 or 30 bearings… (this are fictional numbers to just show knowing the rate of change of failure over time – Weibull distribution – you can adjust stocking levels appropriately.)
I do not think I have a worked out example, and it would make a good article, so thanks for the topic idea.
Cheers,
Fred
Maria says
Hi Fred,
what do you think about newer methodologies (e.g 217Plus/PRISM, FIDES etc.) created to replace the out-of-date MIL-HDBK217F ?
Thanks in advance for your reply.
Fred Schenkelberg says
Hi Maria,
I don’t think too much about them as they continue to rely on faulty assumptions and constructions. They are still essentially parts count methods and rely on the constant hazard rate and every component can cause a failure approach.
They have more adjustments and factors, yet given the basic approach, just as bad as the Mil Std every was.
Cheers,
Fred