Reliability and Availability
In English there is a lot of confusion on what reliability, availability and other ‘ilities mean in a technical way. Reliability as used in advertising and common discussions often means dependable or trustworthy. If talking about a product or system it may mean it will work as expected.
Availability is less common, yet available implies that a person or system is present and ready to engage or start. When picking up a friend at their home, if they are prepared and ready to depart when I arrive, than we can say they were available. This is closer to our (reliability professionals) technical meaning.
We have a couple of issues to overcome when talking about reliability and availability.
Common misunderstanding
First we need to be very clear when we talk about reliability and availability. For reliability we mean: A system will function as expected in the described environment with a specific probability of successful performance over a duration. The common meaning is closer to availability — it will work when we need it. The car will start.
When talking about reliability, be clear. Either state or imply the system, function and environment (easy when everyone knows we’re talking about a specific product with well documented functions and environmental conditions). Second always state reliability as a couplet of a probability of success and duration. For example, 98% chance of survival over 2 years.
When there is any doubt then specific the function, environment, and use conditions, too. For example, a fan will provide xx airflow for a computer chassis in a US home environment 24 hours a day for 2 years with 98% chance of successful operation. We could further define the use conditions to include RPM, average speeds, speed profiles, environmental contains such as temperature, humidity, dust, vibration, etc. The functional specifications may include back pressure, air inlet size and filters, etc.
Keep in mind that many understand reliability to mean it will work — and little more. Some believe a 2 year product implies a normal distribution with a mean value at 2 years.
Availability has fewer misunderstandings, yet is not always clear how to measure availability. What the customer or operator actually wants is the equipment is ready to function when expected to operate. For example, when I walk out to my car, I expect it to start, whenever I walk out to my car. If I do not need to drive anywhere today, I’m not testing if the car is or was available, yet I was expecting it to be ready. As opposed to knowing my car in the shop for maintenance, thus not ready to go.
Relationship between reliability and availability
As you know, a system that doesn’t fail is reliable, it also is available. If a system doesn’t have any downtime for a year, we can say in hindsight it had 100% reliability and availability over that year. There may exist a chance of failure for each moment of operation, thus the expected reliability may have been 99% for the year, and this past year without failures, we didn’t incur failure which should have been rare.
Now a system that regularly fails, say every month may still have a very close to 100% availability over a year if the time to restore (repair) the system to operation is very quick. Let’s say it only takes a few seconds to restore the system, the even if the reliability is very low over the year, the availability remains very high.
In practice this isn’t always the case, quick repair that is, therefore maximizing reliability and minimizing maintenance time is what we often work to achieve. One way to measure availability is to tally the uptime of a system and divide by the sum of all uptime and downtime over a specific duration. So if a system is to operate 24/7 for a year (8,760 hours) and incurred 10 hours of downtime, the availability is 8750 / (8750 + 10) = 0.9989 or about 99.9% available.
When talking about reliability and availability be clear, define the terms and remain consistent. Note I’m not diving into all the added unnecessary confusion of sorting and limiting what is counted as a failure (many do not include no trouble found failures) or the many different element so downtime (diagnostic, wrench, logistics times). Keep it simple, count everything the customer considers a failure, and be very clear. A complex algorithm for reliability or availability rarely sheds any information on what is happened, thus keep the metrics simple and complete.
Hilaire Perera says
Availability is a performance criterion for repairable systems that accounts for both the reliability and maintainability properties of a component or system. It is defined as the probability that the system is operating properly when it is requested for use. That is, availability is the probability that a system is not failed or undergoing a repair action when it needs to be used. There are a number of different classifications of availability,
• Instantaneous (or Point) Availability.
• Average Up-Time Availability (or Mean Availability).
• Steady State Availability.
• Inherent Availability.
• Achieved Availability
Mark Powell says
Ananda,
The term “Availability” is not limited to only “repairable” systems.
For example, ammunition is most definitely not a “repairable” system, and availability of ammunition is a function of usage rate, manufacturing rate, logistics, and administrative and shipping delays. Likewise with missiles and rockets.
Availability really has the more general definition Fred used, it is the measure of the uncertainty (a probability) that the system is in a state ready to perform its intended function.
Mark Powell
Gary Goldring says
In my previous role I was often presented with availability requirements for my comment. Such a requirement might read “The system is to have an availability of 95%”.
My comments usually went something like.
“What is the time boundary for this level of availability?”
“What is the underpinning reliability requirement? (preferably expressed as probability of mission success)
“Why do you want this level of availability?”
“Assuming your 5% unavailability equates to 438 hours downtime over a year what would be the impact on your system if all that downtime occurred in one period? i.e. just over 18 days downtime.”
That usually got them thinking and so we would re-write the requirement to suit the actual need based on equipment/usage profile/environment etc.
Too many project teams tried to disconnect reliability from availability until I explained that a piece of equipment which is unreliable but easy to fix can have the same level of availability as a reliable piece of equipment – hence the need to underpin availability with reliability.
Fred Schenkelberg says
Excellent point Gary, and really like the questions you would ask. May have to use those and a few others to draft an article. cheers, Fred