The classic formula for availability is MTBF divided by MTBF plus MTTF. Standard. And pretty much wrong most of the time.
Recently working for a bottling plant design team we pursued the design options to improve availability and throughput of the new line. The equipment would remain basically the same, filler, capper, labeler, etc. So we decided to gather the last 6 months or so of operating data which included up and down time. Furthermore the data included time to failure and time to repair information.
The plant traced availability using the classic formula, simply the total operating or up time by the total run time. We learned during plant visits that the longer runs tended to get better availability. And the data, looking at short runs (1 day) versus long runs (5 days) did show a marked difference in availability. It also showed the MTBF went up with longer runs. The MTTR remained pretty constant or went up just a little.
Recall we have the ‘time to’ data. A little work with a spreadsheet and we fit distributions to the data. Weibull, Lognormal and Exponential clearly fit different equipment results. And, we had a wealth of data, permitting very good data analysis and distribution fit decisions. For sake of this article, let’s assume the equipment operating time is well described by a Weibull distribution with a beta less than 1. And the time to repair data is well described by the lognormal distribution. While this wasn’t universal across all the types of bottles and equipment, it will work for the purpose of this article
So why do we commonly use MTBF with the assumption of an underlying exponential distribution? There are many reasons: ignorance, “always done that way”, lack of ‘time to’ data or in rare cases the assumption is valid. Let’s remove the ignorance excuse and ‘always….’ and make the case to get the right data at the same time
Why is using MTBF so bad for availability calculations? Let’s look at one example, the design of the bottling line. There are up to 10 steps in the high volume bottling process and each step involve highly complex and expensive equipment. The design team was balancing the line availability and throughput per shift with the cost of the equipment. The other part of the design consideration was the cost of storing finished goods with the change over time between flavors and bottle sizes. The idea was to add just enough redundant equipment that the line change over time and line availability permitted frequent line changes which significantly reduce finished goods storage costs.
In a perfect plant, each flavor and bottle size combination would have a dedicated line. In reality the expected run times of the flavor and bottle size combinations ranged from a few hours per month to two weeks a month, with most running for less than a week. Also, consider the equipment cost a few million dollars per unit.
Therefore, we built a availability model of the existing line configuration and the various proposed line configurations. The model would permit the simulation of various expected line management policies to determine ability to reduce finished good storage costs. The model would significantly influence the million dollar decisions in the project.
Back to why we want to look at the underlying and very common assumption – MTBF and MTTR often assume the exponential distribution. This distribution does not account for changes in the failure or repair rate. The first hour and any hour of operation after that have exactly the same average failure rate or repair rate.
Recall the observation and supporting data that suggest neither MTBF nor MTTR are constant, both seem to depend to some degree on the length of the run. Well, I’m a statistician and the plant had years worth of data on the equipment. Happiness.
First get the data and determine the best fitting distribution. This is basic regression analysis. Here is an example of the difference between assuming the exponential and fitting a distribution.
Here the data was drawn at random from a Weibull Beta 2 Eta 1000 distribution. the Weibull fit is pretty good as expected. Forcing the fit to exponential overestimates the failure rates at earlier times, and those earlier times of often of most interest.
Second, determine how to calculate availability given the fitted distributions. This second step had me hitting the books.
The general formula to calculate availability is based on expected values and not MTBF and MTTR based on exponential distributions. ‘Expected Values’ if you are like most engineers you many only have a vague recollection of this statistical term. Most associated this phrase with the mean or average value, which is mostly true. The availability formula changes from
$latex \displaystyle A=\frac{\text{MTBF}}{\text{MTBF}+\text{MTTR}}$
to this
$latex \displaystyle A(t)=\frac{{{E}_{failures}}[t]}{{{E}_{failures}}[t]+{{E}_{repairs}}[t]}$
If your are familiar with the Weibull distribution you recall if beta is equal to 1, the characteristic life is the theta. The same as for an exponential distribution. When the beta is not equal to one, the characteristic life is not equal to theta. The characteristic life is another way to say expected value.
For the exponential distribution the expected value calculation is very commonly used to calculate MTBF.
$latex \displaystyle {{E}_{Expontential}}[t]=\frac{1}{\lambda }$
Whereas, for the Weibull distribution the formula is
$latex \displaystyle {{E}_{Weibull}}[t]=\lambda \Gamma \left( 1+\frac{1}{\beta } \right)$
and, for the Lognormal distribution the formula is
$latex \displaystyle {{E}_{Lognormal}}[t]={{e}^{\mu +\frac{1}{2}{{\sigma }^{2}}}}$
Therefore, with the data properly described with the appropriate distribution we calculate the expected values and determine the availability. Easy.
Stephanie says
Thanks so much for the great information. I always love to read more about the industry.
Thank you.
Productivity and Reliability Based Maintenance, Since these based on the reliability, How can we describe tihs? says
I follow up any data from you.
Hilaire Perera says
The availability described in the begining is Inherent Availability Ai. There are many types of availability. Operational Availability Ao is a measure of the average availability over a period time . ( It does not contain MTBF ).
It includes all experienced sources of downtime such as administrative downtime, logistic downtime, etc. Operational Availability is the ratio of the system uptime and total time. Mathematically it is given by Ao = ( Uptime ) / ( Operating Cycle )