Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by Kevin Stewart Leave a Comment

RCA is the Bedrock of a Reliability Program

RCA is the Bedrock of a Reliability Program

Basic Reliability Definition

Occasionally, I like to step back and reflect on reliability in basic terms.

In that spirit, the basic premise of reliability is usually stated as “The probability that an item will perform a required function, without failure, under stated conditions, for a stated period of time.”

To use the reliability equation, the definition of failure must be defined, so you can tell if your equipment has indeed failed.  This way you can include it in the MTBF (Mean Time Between Failure) calculation.

After you have defined a failure and recorded them appropriately, you can plug the numbers into the reliability equation, R = e ^-(λ*t)  where λ is the failure rate which is defined as λ= 1/MTBF and come up with an objective value for the reliability.

reliability_smallBased on this equation, as the number of failures goes down, the reliability increases assuming all the other parameters stay unchanged.

This is important because I can also increase the reliability by reducing the “t”, or mission time the system is expected to perform the function.

In other words, if I can’t make a system go 8 hours without a failure, I would have R = 0. However, if it could always be counted on to go 7 hours without a failure, then I could change the mission time to 7 hours and have R = 1 (100% probability of going 7 hours without a failure).

In a manufacturing environment, they are always looking to extend the mission time so we tend to ignore the time adjustment issue. As company’s try to achieve improved reliability, they implement systems, hire consultants, install software, establish preventive maintenance programs, implement RCM, and many other great tools which all are all necessary to provide or support maintenance strategies to increase reliability.

Which is just another way of saying eliminate or reduce failures.

It is all about eliminating failures

So at a basic level, it is all about eliminating or reducing failures.

For example, one of my first forays into reliability was an issue with a bearing that was causing significant downtime. There were four identical systems, and the plant had floated a capital improvement authorization to add a 5th.

The reason for this was they couldn’t keep the system running due to bearing failures. I started investigating, and a tradesperson suggested that we should look at system one. “Why?”, I asked. “Because system one isn’t failing,” they replied.

I then wanted to know why system one was not failing The tradesperson didn’t know, so we scheduled an inspection to see if we could find out.

As the mechanic started to raise the pillow block cover, I said: “Ok you can put it back down.” The tradesmen looked at me like I was crazy. I asked him to do it again, but this time I asked him to look at the bearing.

He did and realized the same thing – it was different than all the others.

So, we buttoned it up and investigated. Luckily the system we just inspected had not had a bearing changed in a while and that drove us to uncover the fact that someone had replaced the spherical ball bearing with a spherical roller bearing.

Others had just continued to follow suit and the problem perpetuated itself.

Continuous improvement

So, we put the correct bearing in all of the four systems, canceled the capital improvement authorization, and I got an honorable mention from the plant manager.

We could have stopped there, but I kept asking “why?” when one of the bearings failed.  That led us to other issues.

We had to correct things such as improper installation; incorrect internal clearances; not correcting for soft foot, using poor alignment practices (time card and eyeball), not correcting for hot alignment, incorrect use of lubricant and incorrect frequency of lubrication, and no vibration monitoring.

Most will recognize that doing all of these things is now what is referred to as precision installation.

The reliability of this system was improved by simply extending the MTBF from 1 month to 8 years, and we saved significant maintenance dollars and time.

We improved the reliability so much that shortly before I left the plant, I got a call from the maintenance crew supervisor saying that they were having problems with the same systems.

In doing the root cause, it turns out that they were so reliable that people had forgotten how to fix them.

At the time, we had no CMMS, no PM program, obviously training issues, and had never heard of RCM.

I was also using an oscilloscope-type of analyzer to do vibration analysis with an X/Y pen plotter to capture the signature. From this experience, I learned that improving MTBF through Root Cause Analysis or defect elimination first, it can be quick and cost effective and it also doesn’t hurt your career.

I learned that it provided the quick wins that management wanted and that allowed them to take the leap of faith (for them) to support reliability.  I also learned that after doing Root Cause Analysis, there was still a need to capture the lessons learned in the CMMS.

This became evident after eight years and the lessons hadn’t been captured in order to be passed on to others.

It showed that there was a proper timing to implementation to each of the tools that are available.

Lessons learned

So what has changed from that time which was back in the mid-1980’s?

We have better vibration equipment and RCM. We have FMEA’s and state of the art CMMS, to name just a few. These are all valuable tools and integrate into an overall reliability program. The simple bearing example used in this article always makes me careful when discussing how to best attack an improvement project.

I can use all the tools available today, have no unscheduled downtime on equipment but still be doing more maintenance than necessary.

My lesson learned from this plant experience was that basic root cause analysis has a very prominent place in the total reduction of overall costs.

Many times it should be the first thing considered to implement in your program because as the title of this article suggests, it provides the bedrock on which to build your program.

Please let me know if you think RCA is the bedrock – if you don’t agree, what is?

  Ask a question or send along a comment. Please login to view and use the contact form.

Filed Under: Articles, on Tools & Techniques, Reliability Reflections Tagged With: Rca, Root Cause Analysis

About Kevin Stewart

Welcome to Accendo Reliability – join us and learn the art and craft of reliability engineering

I am an experienced educator and maintenance/reliability professional with 38 years of practical work experience in a variety of roles for ALCOA Primary Metals Group and ARMS Reliability.

« Adjusting to Customer Expectations Changing
3 Case Studies of How to Define the Right Reliability Requirements for Each Customer »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Articles by Kevin Stewart
in the Reliability Reflections series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • test
  • test
  • test
  • Your Most Important Business Equation
  • Your Suppliers Can Be a Risk to Your Project

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy