Let’s say we have a population and we are interested in the mean (average) of that population’s life. We select a sample (at random if at all possible) and measure a value, like time to failure, for each selected item in the sample.
We calculate the mean life of the sample by summing the sample values and dividing by the number of items in the sample.
Because we are only using a subset of the population it is possible the sample items are from one part of the population, say the tall part only. It may not be likely, yet it is possible to have selected samples that do not represent the range of values in the population.
It is this possibility that the sample statistic expected to represent the population parameter doesn’t actually even come close is the notion of statistical confidence. In a positive manner, we say there is a 95% confidence that the true unknown population parameter falls within a range of values, also called the confidence interval or bounds. That means there is a 5% chance that the actual and unknown population parameter is outside that range. In other words we are 95% confident that the sample is ‘this’ close to the actual value.
For MTBF confidence intervals we are often only interested in the lower limit (one-sided). We expect with said confidence that the true unknown value is above this lower confidence value, with Type II – predetermined number of failures terminates the test.
$latex \displaystyle \theta \ge \frac{2T}{\chi _{\left( \alpha ,2r \right)}^{2}}$
Where,
- θ is the calculated mean life (MTBF)
- T is the total time the samples operated before failing (or the test was ended)
- χ2 is the Chi-squared distribution
- α is the level of risk (1 – confidence)
- r is the number of failures, 2r is then the degrees of freedom for the chi-squared
Now an example. Given the MTBF for a test with 2 failures is 1525 hours. The total time, T, is 3050 hours and there were 2 failures, r. Calculate the 90% lower confidence interval for the estimated MTBF.
$latex \displaystyle \theta \ge \frac{2(3050)}{7.779}=784\text{hours}$
This means there is a 90% chance that the true and unknown population MTBF is greater than 784 hours. And, there is a 10% chance that it is less. Unless we determine the population mean (measure ever unit in the population) we won’t know.
For fun, consider we are willing to take more risk of the sample not representing the population. The same sample, just change the confidence. Let’s go from 90% to 60% and we find
$latex \displaystyle \theta \ge \frac{2(3050)}{4.045}=1,508\text{hours}$
Which has a higher value. Interesting. Same data, more risk, smaller confidence range. In other words by accepting more risk, we are saying there is now a greater chance that the true unknown parameter falls outside the range described by the confidence interval. The true value doesn’t change, the sample statistic doesn’t change. And, we’re saying the lower confidence value is higher than before.
Without careful consideration it appears the population lasts twice as long, 784 to 1,508 hours. Nothing actually has changed, just the increase in risk that the sample represents the true value.
Check out the next datasheet that crosses your desk – too many use a 60% confidence – why is that?
nm says
thank you for the useful artcile. I’m working in reliability and failure analysis field. as you mentioned, usually in datasheets they use 60% of confidence level. would you please answer the question “too many use a 60% confidence – why is that?” and should I use the data related to 60% of CL or 90% for example for sensitive projects? thank you
Fred Schenkelberg says
Hi NM,
Keep in mind that the confidence interval is an indication the result reported is from a sample used to represent the population. We often do not have the information such as sample size or testing conditions to properly judge the values.
A single test with a fixed sample size will report different value at different confidence levels. The 60% level will appear ‘better’ then a 90% level yet the underlying data hasn’t changed. The 60% level shifts more of the risk to you, as they are saying there is a 40% chance the actual product will not perform within the stated range.
If the item under consideration is important to the performance of your product, do not accept vendor data that is poorly reported. Get the details or do the testing/characterization yourself. You need to understand how the item will fail and how the testing evaluates that failure mechanism.
Cheers,
Fred
SJ says
Hi Fred
How would you go about interpreting exactly what the following statement means
“Reliability required = 98% to 95% confidence of mission success”
The only other data I can see given is that
“Availability required = 95% based on 10 days operation in 2 month deployment”
and
“Assume 500 running hours per year” over a 15 year life
Thanks in advance
Fred Schenkelberg says
Not a whole lot to go on here. If someone want a 98% chance of surviving they really should tell us over what duration. This may be very easy to accomplish over an hour and nearly impossible over 20 years.
The 95% confidence is only useful for planning for testing… it is not part of the goal. The goal is for the population and what chance of success you, they, really want to have happen.
For test planning and in particular sample size selection we need to know things like confidence. This helps us know what risk of the sample not representing the population we’re willing to accept.
If you change the confidence you may need more or fewer samples yet the actual reliability does’t change.
What is the mission duration, environment, set of stresses, profile of use/stress, etc.
500 hrs per year over 15 years is great, is the mission time?
During a mission can we have failures and quick repairs, or is that not an option?
The little information provides means you could pretty much provide any product or system you want at any reliability level, and you’re probably good (or could look good under specific circumstances.)
I would be asking a lot more questions.
Cheers,
Fred
SJ says
Thanks Fred I suspected something like that. For mission I think we take 10 days in 2 months so mission time = approx 85 hours. During mission no repairs.