Reliability engineering has an image problem. It is seen as an imbugerance that destroys budget, schedule and fun. People sometimes think reliability engineering is simply statistics, data analysis and other mind-numbing stuff. Reliability purgatory. Which brings us to the first reason you need to do reliability engineering.
#1 – Reliability engineering is not reliability purgatory. Reliability purgatory is all effort and no outcomes. Reliability happens at the point of decision. Design decisions. Manufacturing decisions. Maintenance decisions. True reliability engineering helps you make better decisions – which often comes down to organized judgment and not statistics.
The second reason you need to do reliability engineering is to #2 eliminate problems – not just failure. We typically fixate on ‘failure’ when talking about reliability engineering. True reliability engineering prevents problems. Like manufacturing issues that force you to delay launch OR launch sub-standard products. Problems like finding out you selected the wrong material at the final design review. Getting tolerances wrong, having circuitry too close to hot exhaust manifolds and anything else that forces you to redo stuff is what true reliability engineering eliminates.
So true reliability engineering means #3 no complex expensive fixes. Thinking about reliability at the point of decision means you incorporate simple things that make robust designs from the start. And there are plenty of great tools to help you #4 quickly solve the VITAL FEW problems – and not the trivial thousands. Fixing the trivial thousands is over-engineering. Today’s customers and users are demanding smaller, lighter and ‘funkier’ things. Not over-engineered monstrosities.
Now this might surprise you – especially if you have had bad ‘reliability experiences.’ Reliability engineering means #5 happier people. When reliability engineering is baked into culture, you don’t have loud ‘infant managers’ berating engineers, designers, manufacturers and maintainers into doing the wrong thing fast (WTF). Any short-term cost and time saving quickly blossoms into those problems we talked about above. Then comes blame. Costs are incurred. Expenditure is cut to the bone. There is no money to innovate. Yuck.
True reliability engineering creates positive experiences. And #6 saves (LOTS OF) time and money. Not only are happy people more productive, but our now robust product, system or service hasn’t encountered many (if any) production problems.
But we are not done yet. Reliability engineering #7 makes your thing better (than your competitors). Failure isn’t limited to your machine breaking, exploding or physically disintegrating. Failure occurs when we fail to meet our customer’s or user’s expectation. And that means all our reliability tools can be unleashed to help you come up with amazing new features and functionality … before we start designing.
Happy people. Knowing what they need to do. Focusing on the vital few. No production problems. # 8 No overwhelm. Reliability purgatory involves ‘process zealots’ insisting you do everything ‘by the (very old) book.’ Or ‘ponderous professors’ complaining about not having enough data and taking decades to analyze what ‘little’ data they have. True reliability engineering involves logically working out the VITAL FEW things we need to do to help make better decisions.
The inevitable by-product is #9 value. We are on time and budget. Time to market (TTM) is reduced, and we can perhaps reduce our recommended retail price. (True) reliability, budget and schedule never compete – no matter what people tell you.
And the final reason you need to do reliability engineering is #10 your reputation. If you consistently design, manufacture or maintain better things cheaply and quickly … you will get noticed.
So why doesn’t reliability happen more often? Perception? Reliability purgatory? Are these the only things stopping you?
DonMacArthur7 says
Thanks for the outstanding article Chris!
Christopher Jackson says
Thanks Don. Glad you liked it!
Larry George says
YES!
“Or ‘ponderous professors’ complaining about not having enough data and taking decades to anlayze what ‘little’ data they have.”
Pardon me for complaining that people don’t use the data they have. Ships (installed base cohort counts) and returns (complaints, repairs, replacements, spares sales, etc.) are required by GAAP. It may take a little work to dig out the data, but they are population, not sample data. Ships and returns counts are statistically sufficient to make nonparametric reliability and failure rate function estimates, without unwarranted assumptions, even for repairable systems. Sample on web site in list of files, “NPMLE”
I hope Fred will include an article on that.
Christopher Jackson says
Thanks for your feedback Larry. Although only a small part of my article deals with that … it is an endemic problem. I note that your example is about trying to assess (warranty) reliability after launch, but there is also a lesson to be learned for data analysis during production to help inform design, manufacturing and maintenance decisions. We are too often surrounded by data we never to help us make better decisions. And an 80 % correct decision made today is better than a 100 % correct decision when it is too late.
Larry George says
“lesson to be learned for data analysis during production to help inform design, manufacturing and maintenance decisions.”
Field reliability of older products and their parts informs design, manufacturing and maintenance decisions, if people use that information! New products are made with some old parts, produced in the same old processes, sold to same or similar customers, and used in same or similar environments. https://sites.google.com/site/fieldreliability/credible-reliability-prediction
Thanks for the addition of embuggerance to my vocabulary.
Christopher Jackson says
Completely agree … but of course I am happiest with expanding your vocabulary!
Merrill Jackson says
I really like your article.
I’ve thought about using the movie, “Quigley Down Under”, as an analogy to doing the wrong thing fast. The approach used by the bad guy, Marston, and his men is to fire as many bullets as fast as they can. The approach used by the good guy, Quigley (played by Tom Selleck) is that of a snipper with his extra-long barrel, 50 caliber rifle. You use careful aim, from a very long distance, using only one shot.
The contrast is when you are trying to prevent problems, you use the sniper approach. If you did not prevent a problem, you are left with trying to use a Phalanx and do everything in your power as fast as you can no matter the cost, with a low probability of success, high risk of creating other problems, and no time to recover.
Christopher Jackson says
Thanks Merril. Or in other words … ‘ready – aim -fire’ and not ‘ready – fire – aim.’ The beating heart of good reliability engineering is those really simple, inexpensive but thoughtful characteristics we design into our products, systems or services … from the start. Reliability engineering gives us the framework for going ‘ready-aim-fire.’