A Few Simple Ideas to Improve Your Reliability Program
Spending too much on reliability and not getting the results you expect? Just getting started and not sure where to focus your reliability program? Or, just looking for ways to improve your program?
There is not one way to build an effective reliability program. The variations in industries, expectations, technology, and the many constraints, shape each program. Here are three suggestions you can apply to any program at any time. These are not quick fix solutions, nor will you see immediate results, yet each will significantly improve your reliability program and help you achieve the results you and your customers expect.
1. Stop using MTBF
Given the nature of this blog and site, you may have expected I would recommend not using MTBF. Same applies to MTTF, MTBUR, and the many variations of MTxxx that exist.
The primary reason is MTBF is not useful. It doesn’t help you and your team make decisions that lead to improving reliability.
The focus should be on balancing what you know about the performance and your other priorities. While you may have fantastic cost-of-goods data, reliability data is often vague. Do not cloud that scant information by obscuring what it means using MTBF.
Instead, use reliability directly. The function performs within a defined environment with a probability of success over specified duration. My phone has a goal of making/receiving calls in my home office with 99% probability of success over five years.
Use reliability
- to set goals,
- do apportionment,
- to set vendor requirement,
- for comparisons,
- To define and report reliability tests,
- And anywhere you talk about reliability.
The clarity alone will improve your reliability program.
2. Do your failure analysis
When something fails in a prototype or in a fielded product, you need to understand what failed. Not only that, you need to know the root cause of the failure. To implement a design, process, assembly fix to address the failure, you need to understand what happened.
Isolating the failed part and sending it off to the vendor for analysis rarely works.
Vendors rarely have the capability to determine the root cause of a failure when looking at an isolated part or two. The failed component may be a victim of poor design or another element that has failed.
The element that fails may have design or assembly problems. It may be the part downstream for another item of the product that is not functioning properly. We don’t send blown fuses to the vendor for analysis. Instead, we look for the cause of the excessive current.
Instead, work with a 3rd party failure analysis laboratory directly. You will likely get answers faster; you will also likely learn more about the potential root causes especially if the root cause is part of the vendor’s design or assembly process. It is your failure analysis, so you need unbiased information to allow you to determine the appropriate corrective action.
3. Identify Reliability Risks Early
Product testing or reliability testing or demonstration tests are too little too late in most cases. You cannot afford to test in reliability especially when done late in the development process.
Also simply testing a product may or may not find novel or rare failure mechanisms. Sample sizes alone prevent finding significant issues. Not being able to detect precursors to serious problems is another. Testing is not a great means to identify reliability risks.
Instead, use tools like FMEA and HALT as early and often in the development process to help you focus on failure mechanisms that present the most risk to the reliable performance of your product. Give your development team time to focus on solving or mitigating the reliability risks facing your product.
Summary
That is just three suggestions. I’ve seen each of these make significant improvements to reliability program in diverse industries.
What would you recommend be in the top 10 list of improvements to a reliability program? What works for you, why? Please leave a comment or question below, as I would like to hear from you.
Hilaire Perera says
Fred; Agree to what said about ” Stop using Mean Values “. But there are situations when no Failure/Time distributions are available to calculate Reliability. In such cases ” Mean Values ” can be used with a Statistical Confidence Level. The Statistical Confidence Level is the probability that the corresponding confidence interval covers the true ( but unknown ) value of a population parameter. Such confidence interval is often used as a measure of uncertainty about estimates of population parameters.
Fred Schenkelberg says
Hi Hilaire, thanks again for the comment – you always add valuable content with your comments. Besides the mean with confidence intervals, one could also use non-parametric approaches to side step any issues with underlying distribution assumptions. cheers, Fred
Tugbe Sulureh says
Without time to failure distribution, an analysis of the physics of failure mechanisms can be exploited to establish a likelihood function. The likelihood function can be analyzed and parameters of interest can be calculated including that of reliability.
Fred Schenkelberg says
That is true Tugbe, and an advanced approach. Do you have examples of this approach that you can share?
Cheers,
Fred
yi kang says
Thanks Fred, great summary.
Add my 2 cents, one should understanding its design and customer behave in the field to fit into FMEA.
Chears,
E.
Fred Schenkelberg says
Agree Yi, thanks for the addition. cheers, Fred