Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by Fred Schenkelberg 1 Comment

First Steps with Data

First Steps with Data

Once word got out that I was taking graduate-level courses in statistics, I dreaded the knock on the door. Colleagues, some of which I knew and others from some far reach of the company, would ask if I could take a look at their data. I didn’t learn the necessary first steps with a stack of data in class.

I’ve lost count of the number of data sets I’ve reviewed and analyzed. I know there are important considerations and questions before creating the first plot. Let’s review the essential first steps you should take when presented with data.

Is there a decision related to this data?

Why are you looking at this data? Now, I find it difficult not just to jump in and start the analysis, yet, which analysis are you attempting to accomplish? A great question to answer is about the decision this dataset is to inform. Is this a comparison, an optimization, or an exploration?

If the question is “Will this design create a product that meets our reliability goal?” that helps to guide your next steps. If the decision is about which vendor better meets our requirements, that suggests a range of analysis options.

The type, quality, and quantity of data depend on the decision the analysis is to inform. Thus, when first encountering a dataset, start with what information you will need from the data. Plus, assess if the data is sufficient to provide the necessary information.

Data Collection and Errors

Let’s say the dataset provided has 1,000 entries, and all are times till each of those 1,000 products failed. This would be complete if the organization only shipped 1,000 units. If they shipped 100,000 units, what happened to the other 99,000? While just failure data is fine for some situations, it is not enough information to estimate the impact on future warranty claims.

An often-forgotten aspect of data collection is measurement error. Every measurement system has some error included. None are perfect. Understanding the measurement system used to collect the data may prompt additional questions on the quality of the data and the magnitude of the measurement error.

Another detail to understand concerns the completeness of the data. Is the data a random or not-so-random sample? Or does the dataset include measurements from all items in the population? This affects the type of analysis and how to interpret the results.

Consider the measurement frequency. If the measurement system records events as they happen, that is different than a system that checks for events once a month. Interval data requires different handling and analysis.

Data format and organization

To this point, we haven’t looked at the data within the dataset. Take a look at the data now. This is the start of the data clean-up process. Things like missing data or recording date variations impact various software packages’ ability to use the data. Are the missing data a clerical error or deliberate?

To understand the dataset, hopefully, the columns should have informative labels. “Column 1,” “column 2”, etc., doesn’t provide the necessary information about what is within the column. Dozens of columns with 4-digit numbers without labels or a legend, if not useful.

While some software packages can handle data presented using Nevada charts, not all can. This may require organizing the data for the intended analysis.

One thing that has often caused me problems is a column of numbers with a few data entries stored as text. These are hard to spot, yet when expecting numbers, most software packages balk when confronted with a field with text.

Exploring the data

Ok, you understand the decision this data is to inform and understand the dataset, including how it was collected, measurement error, and how the data is organized. Great. Now it’s time to start the analysis. Or is it?

At this point, I recommend plotting the data in a few different ways. Visualize the data to identify basics like the shape or structure of the data. Time series plots, XY plots, and others provide basic information concerning the nature of the data.

For example, if plotting a column by date collected, and there are long gaps between clusters of measurements, it may indicate the need to understand why that occurred. Another example is sudden changes in the magnitude of the data. The data starts with single-digit numbers and then jumps to 7-digit values. Was that a change in the measurement system scale being used, or does it accurately reflect what happened?

The first step is to know the data, its history, and its behavior. Then do the actual work to conduct the analysis.

Filed Under: Articles, Musings on Reliability and Maintenance Topics, on Product Reliability

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« The #1 Thing Facilitators and Technical Experts Get Wrong About Qualitative Assessments
How To Use CMMS To Support FRACAS Methodology »

Comments

  1. Larry George says

    February 22, 2023 at 4:33 PM

    Thanks for your observations on data. It’s good to ask Why are you doing what you do with the data? What would it be worth to get more data?
    You’re right, not all data come in form or relational db: records in rows, factors in columns. Inferring missing values scares me.
    Stay tuned for my next article. “While some software packages can handle data presented using Nevada charts, not all can.’ Those software packages make Kaplan-Meier reliability estimates and they usually include the variances of the reliability estimates and maybe even confidence bands. Don’t believe them.

    Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Article by Fred Schenkelberg
in the Musings series

Join Accendo

Receive information and updates about articles and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • test
  • test
  • test
  • Your Most Important Business Equation
  • Your Suppliers Can Be a Risk to Your Project

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy