Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by nomtbf Leave a Comment

Basic Outline to Craft Your Professional Development Plan

Basic Outline to Craft Your Professional Development Plan

14783620362_60646695a1_zGetting Enough of the Right Professional Development

Learning the basics of reliability engineering is where we all start. Mastering the range of skills and techniques is a never ending quest.

Improving, maintaining, expanding your reliability engineering professional skills takes many forms. There are plenty of options and sources to support your education, yet are do getting enough of the right material?  [Read more…]

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

Are Your Reliability Engineering Technical Skills Good Enough?

Are Your Reliability Engineering Technical Skills Good Enough?

14782026181_49d0000e8d_oAre Your Reliability Engineering Technical Skills Good Enough?

How do you know? How would you know?

There is a lot to know concerning the technical aspects of reliability engineering. From calculating summary statistics to discovering the root cause of a failure, the body of knowledge you should master as a reliability personal is expansive. [Read more…]

Filed Under: Articles, NoMTBF

by Oleg Ivanov Leave a Comment

Lifetime Evaluation vs Measurement Part 3

Lifetime Evaluation vs Measurement Part 3

14782008631_1af1c79419_oLifetime Evaluation vs. Measurement. Part 3.

Sometimes shifting your perspective
is more powerful than being smart.

—Astro Teller

Guest post by Oleg Ivanov

A common approach for “no failure” testing is the use of the well-known expression

$latex \displaystyle&s=2 (1) \quad 1-CL={{R}^{n}}$

where CL is a confidence level, R is a required reliability, n is a sample size. Its parent is a Binomial distribution with zero failures. This expression is like a poor girl: [Read more…]

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

How to Avoid Delivering Bad Data

How to Avoid Delivering Bad Data

14781654934_58be162f3b_zHow to Avoid Delivering Bad Data

We gather and report loads of data nearly every day.

Is your data “good data”? Or does it fall into the “bad data” category?

Let’s define the difference between good and bad data. Good data is accurate, timely, and useful. Bad data is not. It may be time to look at each set of data you are collecting or reviewing and judge if it’s good or not. Then set plans in motion to minimize the presence of bad data in your organization.

Good data is accurate

By this I mean it truly reprints the items or process being measured.

If the mass is 2.3 kilograms, then the measurement should be pretty close to 2.3 kg. This is a basic assumption we make when reviewing measurements, yet when was the last time you checked? Use a different measurement method, possible a known accurate method to check.

Measurement system analysis includes a few steps to determine if the gage making a measurement is true or not. Calibration may come to mind, as it is a step to verify the gage readings are reflecting standard measures. A meter is a meter is a meter across the many ways we can measure distance.

It also includes checking the common sources of measurement error:

  • Repeatability
  • Reproducibility
  • Bias
  • Linearity
  • Stability

You may also want to understand the resolution or discrimination of the measurement process.

If these terms and how one goes about checking for accuracy, it may be time to learn a little about MSA.

Good data is timely

If the experiment results are available a week after the decision to launch the product, it will not be considered in the decision. It is not useful for the decision concerning product launch. If the data was available it may alter the decision. Late, we will not know.

Timely means it is in time for someone or some team to make a decision. Ideally, the data is available immediately. When a product fails in the field, we would like to know right away, not two or three month later. If a production line becomes unstable, knowing before another unit of scrap is produced would be timely.

Not all data gathering and reporting is immediate. Some data takes months or an entire year to gather. There are physical constraints in some situation that day the gathering of data. For example is takes on average 13 minutes, 48 seconds, for radio signals to travel from a space probe orbiting Mars to reach Earth [1]. If you are making important measurements on Earth it should be a shorter delay.

The key point here, is the data should be available when it is needed to make decisions.

Good data is useful

Even if the data is accurate and timely is may not be useful. The data could be from a perfect measurement process, yet is measuring something we do not need to know or consider. The data gathered does not help inform us concerning the decision at hand.

For example, if I’m perfectly measuring production throughput, it does not help me understand the causes of the product line downtime. While related to some degrees, instead of the tally of units produced per hour, what we really would find useful is data concerning the number of interruptions to production, plus details on the root cause of each.

Setting up and maintaining the important measurements is difficult as we often shift focus based on the current data. We spot a trend and want to learn more than the current data can provide. The idea is we should not setup and only use a fixed set of data collection processes. Ideally your work to gather data is driven by the need to answer questions.

  • Is the maintenance process improving the equipment operation?
  • Is our manufacturing process stable and capable of creating our product?
  • Will the current product design meet life expectations/requirements?
  • Have we confirmed the new design ‘fixed’ the faults seen in the last prototype?

We have questions and we gather data to allow us to answer questions.

How would you describe the data you will look at today? Good or Bad? And more importantly, do you know if your data is good or bad?

—

Time delay between Mars and Earth, Thomas Ormhston, posted 5/8/2012,  European Space Agency, Mars Express Blog, http://blogs.esa.int/mex/2012/08/05/time-delay-between-mars-and-earth/ accessed 4/29/2016

Filed Under: Articles, NoMTBF

by Oleg Ivanov Leave a Comment

Lifetime Evaluation vs. Measurement. Part 2.

Lifetime Evaluation vs. Measurement. Part 2.

Guest post by Oleg Ivanov

A result of life testing can be measurement or evaluation of the lifetime.

Measurement of the lifetime requires a lot of testing to failure. The results provide us with the life (time-to-failure) distribution of the product itself. It is long and expensive.

Evaluation of the lifetime does not require as many test samples and these tests can be without failures. It is faster and cheaper [1]. A drawback of the evaluation is that it does not give us the lifetime distribution. The evaluation checks the lower bound of reliability only, and interpretation of the results depends on the method of evaluation (the number of samples, test conditions, and the test time). [Read more…]

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

MTBF: According to a Component Supplier

MTBF: According to a Component Supplier

14781633834_ac824f7c55_oMTBF: According to a Component Supplier

This one made me scratch my head and wonder. Did I read this right?

A reader sent me an except of a document found on Vicor’s site.

“Reliability is quantified as MTBF (Mean Time Between Failures) for repairable product and MTTF (Mean Time To Failure) for non-repairable product. A correct understanding of MTBF is important. A power supply with an MTBF of 40,000 hours does not mean that the power supply should last for an average of 40,000 hours. According to the theory behind the statistics of confidence intervals, the statistical average becomes the true average as the number of samples increase. An MTBF of 40,000 hours, or 1 year for 1 module, becomes 40,000/2 for two modules and 40,000/4 for four modules…”

source: http://www.vicorpower.com/documents/quality/Rel_MTBF.pdf

The except came with the following note and question

“In my opinion this is completely wrong but as I’m fledgling in this subject I’m sensitive to any statements like this.

Could you be so kind and help me a bit on it?”

[Read more…]

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

Does Your FMEA Study Go Far Enough?

Does Your FMEA Study Go Far Enough?

14781622934_acabe9f466_zExtend Your FMEA Process with Mechanisms

One of the issues I’ve had with failure modes and effects analysis is the focus on failure modes.

The symptoms that the customer or end user will experience are important. If a customer detects that product has failed, that is a failure. The FMEA process does help us to identify and focus on the important elements of a design that improve the product reliability. That is all good.

The issue is the FMEA process doesn’t go far enough to really aide the team focus on what action to take when addressing a failure mode. The process does include the discussion of causes of the failure mode. The causes are often the team members educated opinions on what is likely to cause the failure mode. Often the description of the a cause is a failed part, faulty code, or faulty assembly.

Generally the discussion of causes is vague.

Failure Mechanisms versus Failure Modes

Failures modes are best described as what the customer experiences (no power, loss of function, etc.). Failure mechanisms are the root physical or chemical anomaly that leads to the existence of the failure mode. While we want to remove failure modes, we have to solve, remove, or mitigate failure mechanisms along the way.

The traditional FMEA process in my experience often provides vague classes of causes, hints at potential failure mechanisms, or avoids specifying mechanisms entirely. The actions items from the FMEA study then include investigations to find and understand the actual failure mechanisms (at best) or attempt to address vague classes of mechanisms with broad sweeps of monitoring, testing, or design changes.

Instead focusing the discussion on causes of failures at the level of failure mechanisms, enhances the discussion. Instead of talking about the causes as a component failure, it changes to what happens such that the component fails. Instead a vague average failure rate, it becomes a discussion about design or process errors or variation that leads to the components demise.

The hard part of this approach is the sheer number of ways (root causes) that an item may fail. Consider a simple component solder joint. The potential root causes includes:

  • Contamination
  • Corrosion
  • Dendrite growth
  • Cracking
  • Shear fracture
  • Flex cracking
  • Pad lifting
  • Gold embrittlement

And many others potential issues. Even these brief descriptions may have underlying causes which are the elements requiring attention in order to solve.

Fault Tree Analysis (FTA) and FMEA

Detailing all possible root causes of each failure mode would be tedious and I would suggest unnecessary. One approach I’ve seen is the common approach to FMEA, where we explore the class or basic expected types of root causes that lead to the listed failure mode. Then for the lines in the FMEA study that percolate to the items requiring attention, we then conduct a detailed FTA that flushes out the range and relative frequency of occurrence of the many different underlying failure mechanisms that lead to a specific failure mode.

If the primary cause of a failure mode is a faulty component, then what are the specific mechanisms that lead to a component being faulty. FTA is the right tool here. Used on conjunction with the highest risks identified in the FMEA permit the team to understand and solve or mitigate the right elements in the design or process to make a difference. Being specific with actions that make a difference is the key.

With your work to identify and resolve risks to reliability performance, how do you insure the solutions are actually solving the right problem? What works for you in your organization? How do you extend your FMEA work into effective action?

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

Do You Have Enough Data?

Do You Have Enough Data?

14781613214_c50f085b01_oDo You Have Enough Data?

To make informed decisions you need information.

To form conclusions you need evidence and a touch of logic.

To discover patterns you need data.

In each case, and others, we often start with data. The data we have on hand, or can quickly gather.

We organize data into tables, summaries into reports, display in dashboards, and analyze the results to form decisions. [Read more…]

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

The People Skills of a Good Reliability Engineer

The People Skills of a Good Reliability Engineer

14780859991_8a9053e4e1_zThe People Skills of a Good Reliability Engineer

Having the technical and business skills is not sufficient to be a good reliability engineer.

You must also work with other people. With your peers, across the management team, with suppliers, contractors, and customers.

The ability to work well with others is often complex and situational. Being aware of a few basic skills will allow you learn and improve. Prette and Prette define social competence as the social skills

that meet the different inter-personal demands in the workplace in order to achieve the goals, preserve the well-being of the staff and respect the rights of each other.

A. Del Prette and Z. A. P. Del Prette, Psicologia das relac ̧o ̃es interpessoais: viveˆncias para o trabalho em grupo, Vozes, Petro ́ polis, 2001.

An engineer needs an awareness of the social situation and how their behavior influences others, along with a capability to correctly understand the behavior and needs of others. The concepts discussed under emotional intelligence include:

  • self-awareness
  • self-regulation
  • Motivation
  • Empathy
  • and social skills

Goleman, D., Emotional Intelligence: Why It Can Matter  More than IQ. New York: Bantam Books (1995).
Goleman, D., Working with Emotional Intelligence.  London: Bloomsbury Publishing (1998).

The ability to influence others or to aide in understanding a technical situation relies on effective communication. Beyond presenting the facts, finding, and conclusions, your communication must also build upon the audiences’ current understanding and capability. Also, our presentation must address the needs and expectations of the audience. The audience needs are often unstated, thus the need for your ability to correctly assess the social situation.

A Meeting Example

Let’s say two engineers join a meeting to discuss an engineering problem that requires a solution. One engineer, Juan, has social skills and the other, Tomas, does not. As the team assemblies Juan arrives a minute or two early and greets his co-workers and responds to greetings and comments pleasantly. Tomas arrives on time, does not greet anyone and focuses on his laptop catching up on a few emails messages till the meeting is called to order.

A member of the team opens with a short review of the specific technical challenge and as she started with the review of what is known to date, is interrupted by Tomas. Tomas launches into his solution for the problem and remarks that the remainder of the meeting is pointless as he already has an appropriate solution to implement. The solution is not obvious to the remainder of the group which frustrates Tomas as he repeats his assertion that he has a solution. There is social tension building which Tomas does not recognize.

Juan does sense the discontent between Tomas and the rest of the team. Juan does not fully understand the problem nor Tomas’ solution, yet injects a few questions that help guide Tomas to guide the team to better understand the problem and proposed solution. Juan facilitates a discussion between those with knowledge of the problem so the entire team fully understands the issue, plus assists Tomas restate the proposed solution again to help everyone understand the proposal.

The story of this meeting could continue, with more elements of lack of social awareness and skills and possible methods to create progress. It is situations without someone like Juan having the awareness and social adeptness to facilitate effective and socially acceptable communication, that likely end poorly.

Finding a solution is not the only goal of a problem solving meeting. It is finding a solution that the team can implement effectively. This requires the team understand both the problem and the solution. Furthermore, the one meeting is likely part of series of regular engagements for this team, thus impacts the ability of the team to work cohesively going forward.

When social behavior elements such as discussions, questions, and proposals, for example, do not include consideration of the recipients situation that social friction occurs. Those around Tomas may feel belittled, devalued, excluded or ignored. Those around Juan feel heard, included, and accepted. The ability of Juan to adjust his behavior based on his awareness permits the team to hear and understand any proposal, including those from social inept people.

To make a potentially long story short, even is Tomas had the correct solution for the problem, it was Juan’s people skills that permitted the team to find and implement a solution. Juan will likely advance in his career while Tomas will not.

If you are not familiar with emotional intelligence you may want to read a few introductory articles. If you have not examined your ability communicate with individuals or groups, or you have wondered how to improve your ability to influence those around you, look for material (articles, books, seminars, courses) that provide a framework to understand how to communicate effectively.

When I first became an engineer the focus on my education and training focused on technical skills. Later I learned the importance of social skills and found that my technical prowess (little that it was) flourished as others were able to understand and accept my ideas and solutions. Plus, my contributions to meetings helped with the acceptance of other better solutions.

There are a lot of talented people all around us. Our ability to work with them enhances our ability to implement engineering solutions that meet our business and customer expectations. Plus, working well with our team make our work a bit more enjoyable.

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

The Business Skills of a Good Reliability Engineer

The Business Skills of a Good Reliability Engineer

14773450205_e512c8826d_zThe Business Skills of a Good Reliability Engineer

Knowing how estimate sample size or create a Weibull plot is not enough today.

Just having technical skills, while essential, is not sufficient.

Having a master of business administration (MBA) may be helpful it is not required, yet knowing the warranty and brand cost per failure is essential.

You also need to know which analysis to conduct and how it fits into the larger program, organization, and how it impacts your customers. You need to know the business side of your work as well.

You need to understand the business connection as you create plans and finalize analysis. Each test proposed should include a business connection. Each improvement proposed has to balance with the other priorities and objectives and remain compelling.

What is Important to Your Organization?

I don’t know as I write this what is important to your organization today. It varies, even within an organization and over time. You do need to know what is important and why to become a good reliability engineer.

It is not enough to do what someone tells you to do. At first that may get you starting yet how you connect your work to adding value become essential. Practice finding the connection with every task.

Ask questions.

  • When do you need this information?
  • When will this need an initial review?
  • When do you need a budget estimate?
  • Who will review this report?
  • Who will need this information?
  • What level of reliability knowledge does the audience have?
  • What level of detail is necessary?

And, so on…

The idea is to find who needs what by when and why?

This helps you plan, focus and deliver what is considered important at the moment in your organization.

Typical Priorities

For a high volume consumer product sold during the holiday period, time to market may be the top priority. This is because a delay to shipping means the loss of sales for the year. Missing the deadline is not an option. Thus, your work as a reliability engineer has to focus on how you can minimize the risk to any delays.

Now, you know that finding a critical reliability issue late in the process may delay the program, therefore focusing on reducing the chance of a late discovery is your focus. Early testing and development work on the highest risk element may help you identify reliability risks that have a long lead time to test and evaluate.

For a industrial product with relatively low volume the time to market pressure is not as intense. In this case, the brand image may be paramount. Thus getting the initial units as reliability as customer expect becomes an overriding priority.

One of the issues with this situation is the lack of prototype systems to evaluate. You may not have the capability to test full systems in any quantity prior to to the start of production, if ever. Thus, you work to identify and characterize each potential failure mechanisms becomes critical. The ability to create a viable system model allows you to prioritize your work on improving the critical few elements that will impact system reliability performance.

Other priorities may focus on cost reduction, cutting edge feature sets, or ease of use. Your organization likely has a clear priority. It’s then up to you to connect your work in reliability engineering to that priority.

Now Connect Each Reliability Task to the Value it Provides

For each reliability task proposed or delivered what is the value it provides connected to the primary priority (or secondary priority) in a clear and concise way. If the task or activity doesn’t add value, why are you doing it?

In the high volume consumer product situation, reducing the risk of a shipment delay provide value. Let’s say you are proposing HALT early in the program. How would that add value? By potentially identifying design or manufacturing faults if discovered late in the program would delay the project, then you arguably reduce the risk of delays by some amount.

For the low volume complex equipment situation, a HALT would improve the ability of the team to find and solve complex issues that may otherwise go undetected. Again, HALT would add value. In each case there is not a hard and fast formula yet there is a clear step by step connection between the task and the value to the organization.

Summary

There are two steps here. One understand the business and customers to the extent that you clearly understand what is important. Second, articulate the value related to the business priorities for each reliability task or activity.

To see more examples and ways to show value, see the book Finding Value: How to Determine the Value of Reliability Engineering Activities.

Your skill at connecting your work to value enhances your ability to focus on what is important and necessary. Thus it helps your customers, organization and your career.

Filed Under: Articles, NoMTBF

by Oleg Ivanov Leave a Comment

Lifetime Evaluation v Measurement

Lifetime Evaluation v Measurement

14762203536_122d644c93_zLifetime: Evaluation vs. Measurement

Guest Post by Oleg Ivanov

How can we tell whether an iron is hot enough? The answer is obvious: We can measure temperature by using a thermocouple and a meter. But, in practice, we lick our finger and touch the iron. Sizzle…. Yes, it’s hot!

We know a priori the boiling temperature of water and we can evaluate the temperature of the iron. This method has a lower cost. [Read more…]

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

The Technical Skills of a Good Reliability Engineer

The Technical Skills of a Good Reliability Engineer

14762172816_a10e6f2942_zThe Technical Skills of a Good Reliability Engineer

The fundamental technical skills, as I see it, have to include statistics and root cause analysis skills. This skill set is one of three broad areas introduced in the article, What Makes the Best Reliability Engineer?

I would say these are the minimum technical skills for a good reliability engineer. Able to calculate sample size requirements, understand a dataset, and correctly determine the root causes of a failure.

There are others skills that would be great to include, such as electrical, mechanical and software engineering, plus materials science, physics, and chemistry. Yet, what separates a good reliability engineer from other types of engineering is our ability to plan and analyze life tests and to truly understand how and why failures occur.

Statistics

This is often considered the same as leaping tall buildings with a single bound with respect to skill level.

Few enjoyed their undergraduate statistics class and recently fewer campuses require a stats course. Statistics is the language of variation and is essential for our understanding of the world our products experience.

If every product met the exact specifications of the design and only operated in one set of environmental and use conditions, we would have fewer field failures. If every failure mechanism led to failure exactly the same way within each and every product, we would have far fewer field failures.

Variability may lead to elements of a product being out of spec, or drifting/wearing to an out of spec conditions, thus failing. Variability may also lead to changes in the stress/strength relationships, again increasing the number of failures over time.

The ability of a good reliability engineer to use available data and statistical techniques to:

  • Estimate sample size requirements for environmental testing
  • Analyze vendor life testing results
  • Summarize field failure and warranty datasets

Is just the start of our expected statistical prowess. We also need statistical skills to:

  • Monitor and control processes
  • Design and analyze screening and optimization design of experiments
  • Review and identify field failure trends and unique failure mechanisms

Your ability to use the right tool to quickly solve a problem may span statistical process control, hypothesis testing, regression analysis, and life data analysis all before noon. That may well be like stopping a speeding bullet level of skill.

You may need to master all these elements of statistics if you’re working as a lone reliability engineer, or rely on a trusted colleague is so fortunate. Either way you need to understand enough statistics to know when and how to apply this set of technical skills.

Root Cause Analysis

Failure mechanisms are hard science – even the human factors related failures. Failures occur because something occurs at an atomic, molecular, code or interaction level that precipitate an error or fault to manifest.

Your technical skill includes understanding the range of possible errors and faults that may occur with your product and how to avoid, minimize or mirage each one. It may not be possible to anticipate and fully understand every possible failure mechanisms, thus we focus on the most likely and common, plus continue to learn about those new (or interesting) failure mechanisms that appear.

A second element to this set of skills is the ability to deduce the root cause of a failure. Given a failure, you should be able to conduct the root causes analysis to determine the underlying failure mechanism and initiating circumstance. This permits the team to take corrective action that actually works.

The skill set includes

  • Gathering evidence and understanding the relationships and contributing factors
  • Delving into the unseen elements (microscopes, cross sections, chemical analysis, etc.)
  • Replicating the failure at will

The root cause analysis skill may rely on tools like x-rays and thermal imaging tools, some operated by specialists, yet you need to know which tools to employ and how to interpret their results. It may be fun to explore failures in a well furnished failure analysis lab, yet you need to focus on solving the mystery of what caused the failure.

You also need to be well versed in how to proceed from the “crime scene” (or instance of failure location), through symptoms, to non-destructive and destructive testing. You need to build your “case” based on evidence and logic, plus a healthy dose of engineering knowledge of the fundamental elements involved.

If working as the lone reliability engineer, you certainly need to establish an ongoing relationship with a failure analysis lab. In other words, do not rely on your vendors, do the failure analysis work under your organizations control with your own lab or contracted facility.

Get the information your team needs to solve problems or to avoid future problems by exercising your technical root causes analysis skills.

Good Reliability Work

To be good, I’m suggesting you have to have robust skills in statistics and root cause analysis. Do you agree? What else would you argue is essential to be a good reliability engineer?

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

Considering WIIFT When Reporting Reliability

Considering WIIFT When Reporting Reliability

14762172376_976f51db33_oWIIFT and Reliability Measures

WIIFT is “what’s in it for them”. Similar to what’s in it for me, yet the focus is your consideration of what value are you providing your audience.

As a reliability engineer you collection, analyze and report reliability measures. You report reliability estimates or results. Do you know how your audience is going to use this information?

Consider WIIFT when reporting reliability. [Read more…]

Filed Under: Articles, NoMTBF Tagged With: measure, metrics

by nomtbf Leave a Comment

What makes the best Reliability Engineer?

What makes the best Reliability Engineer?

14762163056_b991c2ff6a_zWhat makes the best Reliability Engineer?

Formal education (masters or Ph.D) or design/manufacturing engineering experience?

Where do you look when hiring a new reliability engineer? Do you head to U of Maryland or other university reliability program to recruit the top talent? Or, do you promote/assign from within? Where do yo find the best reliability people? [Read more…]

Filed Under: Articles, NoMTBF

by nomtbf Leave a Comment

A World of Constant Failure Rates

A World of Constant Failure Rates

14760970966_18c932956c_zWhat if all failures occurred truly randomly?

The math would be easier.

The exponential distribution would be the only time to failure distribution. We wouldn’t need Weibull or other complex multi parameter models. Knowing the failure rate for an hour would be all we would need to know, over any time frame.

Sample size and test planning would be simpler. Just run the samples at hand long enough to accumulated enough hours to provide a reasonable estimate for the failure rate.

Would the Design Process Change?

Yes, I suppose it would. The effects of early life and wear out would not exist. Once a product is placed into service the chance to fail the first hour would be the same as any hour of it’s operation. It would fail eventually and the chance of failing before a year would solely depend on the chance of failure per hour.

A higher failure rate would suggest it would have a lower chance of surviving very long. Although it could still fail in the first hour of use as if it had survived for one million hours and then it’s chance to fail the next hour would still be the same.

Would Warranty Make Sense?

Since by design we cannot create a product with a low initial failure rate we would only focus on the overall failure rate. Or the chance of failing over any hour, the first hour being convenient and easy to test, yet still meaningful. Any single failure in a customer’s hands could occur at any time and would not alone suggest the failure rate has changed.

Maybe a warranty would make sense based customer satisfaction. We could estimate the number of failures over a time period and set aside funds for warranty expenses. I suppose it would place a burden on the design team to create products with a lower failure rate per hour. Maybe warranty would still make sense.

How About Maintenance?

If there are no wear out mechanisms (this is a make believe world) changing the oil in your car would not make any economic sense. The existing oil has the same chance of engine seize failure as any new oil. The lubricant doesn’t breakdown. Seals do not leak. Metal on metal movement doesn’t cause damaging heat or abrasion.

You may have to replace a car tire due to a nail puncture, yet the chance of an accident due to worn tire tread would not occur any more often than with new tires. We wouldn’t need to monitor tire tread or break pad wear. Those wouldn’t occur.

If a motor is running now, if we know the failure rate we can calculate the chance of running for the rest of the shift, even when the motor is as old as the building.

The concepts of reliability centered maintenance or predictive maintenance or even preventative maintenance would not make sense. There would be advantage to swapping a part of a new one, as the chance to fail would remain the same.

Physics of Failure and Prognostic Health Management – would they make sense?

Understanding failure mechanisms so we could reduce the chance of failure would remain important. Yet when the failures do not

  • Accumulated damage
  • Drift
  • Wear
  • Abrade
  • Diffuse
  • Degrade
  • Etc.

Then many of the predictive power of PoF and PHM would not be relevant. We wouldn’t need sensors to monitor conditions that lead to failure, as no specific failure would show a sign or indication of failure before it occurred. Nothing would indicate it was about to fail as that would imply it’s chance to failure has changed.

No more tune-ups or inspections, we would pursue repairs when a failure occurs, not before.

A world of random failures, or a world of failures each of which occurs at a constant rate would be quite different than our world. So, why do we so often make this assumption?

Filed Under: Articles, NoMTBF

  • « Previous Page
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • …
  • 12
  • Next Page »

[popup type="" link_text="Get Weekly Email Updates" link_class="button" ]

[/popup]

The Accendo Reliablity logo of a sun face in circuit

Please login to have full access.




Lost Password? Click here to have it emailed to you.

Not already a member? It's free and takes only a moment to create an account with your email only.

Join

Your membership brings you all these free resources:

  • Live, monthly reliability webinars & recordings
  • eBooks: Finding Value and Reliability Maturity
  • How To articles & insights
  • Podcasts & additional information within podcast show notes
  • Podcast suggestion box to send us a question or topic for a future episode
  • Course (some with a fee)
  • Largest reliability events calendar
  • Course on a range of topics - coming soon
  • Master reliability classes - coming soon
  • Basic tutorial articles - coming soon
  • With more in the works just for members
Speaking of Reliability podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Dare to Know podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Accendo Reliability Webinar Series podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • test
  • test
  • test
  • Your Most Important Business Equation
  • Your Suppliers Can Be a Risk to Your Project

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy