Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by Fred Schenkelberg Leave a Comment

Data Outliers and Questions

Data Outliers and Questions

When looking at a pile of data, sometimes there is a data point that is not like the others. It attracts attention as it is different than the rest of the data.

When I spot something odd in a dataset, I wonder if there is something to learn here. Is this an opportunity to make a discovery or improve a process?

All too often it is tempting to remove the outlier as a mistake. Or to drop the outlier as it doesn’t make any sense and ‘messes up’ the analysis.

The Definition of an Outlier

My computer’s build in dictionary defines an outlier in relation to statistics as:

a data point on a graph or in a set of results that is very much bigger or smaller than the next nearest data point.

Another couple of definitions, that may be helpful are:

A physical defect that does not correlate with a known process, equipment or procedure and is outside the expected or actual probability-density function of time or location.

An apparent deviant observation in a sample. 

The hard part of these definitions is they do not define how much difference has to exist to call it an outlier or not. There are general guidelines, yet not are all that crisp to clearly determine if a bit of information is unique or as expected.

Causes of Outliers

There are three main reasons an outlier may appear in your dataset.

  1. It is the result of a clerical error. The data point experience a transcription error, a transposed or extra digit. A measurement recorded in the wrong field, or some other benign error.
  2. It is the rare yet expected event within the variation of the subject of the measurements. If you are doing a random sample, there is a finite probability of selecting a sample that is more than 4 standard deviations from the center of the population. It can happen, yet is rare.
  3. Something changed or is really just different in some meaningful manner. We use this concept for control charts to spot changes in a process. The outlier is from a different process or something has changed in the process to create a higher probability of this item to have a ‘strange’ value.

There are other reasons, that are worth mentioning:

  • A large fluctuation in measurement error
  • Sampling an item from a different population
  • Sampling bias such that nearly all items selected are similar, expect one or a few
  • Non-random sampling practices
  • Process startup or shut down anomalies
  • Material degradation
  • Damaged sample

As is clear, there are more than just clerical errors at play creating outliers.

Questions, Troubleshooting, and Understanding Outliers

When you identify what is possibly an outlier, what do you do? Hopefully, your action is not to quickly dismiss the data point and move on.

I recommend that you start asking questions:

  • Given what is known about the population is this measurement possible?
  • Is there anything about the sample obviously different than other samples?
  • What else is different about this sample?
  • Are the measurement devices and the measurement process stable and capable?
  • Where could errors in the data stream from measurement to now occur?
  • Where in the sampling process could we create subgroups with a bias toward one group over others?
  • Is there more than one path to create the feature measured?
  • Is the outlier worth investigating to learn how it occurred (is it better and worth replicating, or worse and worth avoiding)?
  • Was this likely a clerical error?

As a last resort, I recommend conducting your data analysis with and without the outlier data. If the results and next steps based on the analysis do not change with or without the outliers, then leave the outliers in the dataset. If the result does change, you need to work to understand if the outliers represent nothing more than a simple error, or the actual variation to expect, or a physical/chemical change this may be your big discovery.

Summary

When dealing with data you will find outliers, those items not like the others. Such bits of data may occur for many reasons and may represent something quite novel and interesting or an error.

As with a root cause analysis, you should only take action once you know the underlying cause(s) of the outlier information. Your action may be to improve the measurement system, data collection process, or launch a study to improve your process, or even launch a study to confirm your discovery.

What is your definition of an outlier and what do you do when you find one? Leave a comment below and add to the discussion.

Filed Under: Articles, CRE Preparation Notes, Probability and Statistics for Reliability Tagged With: Basic Probability Concepts

About Fred Schenkelberg

I am the reliability expert at FMS Reliability, a reliability engineering and management consulting firm I founded in 2004. I left Hewlett Packard (HP)’s Reliability Team, where I helped create a culture of reliability across the corporation, to assist other organizations.

« Lean Project Management for Product Development
Tools to Focus on Plant Reliability »

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

CRE Preparation Notes

Article by Fred Schenkelberg

Join Accendo

Join our members-only community for full access to exclusive eBooks, webinars, training, and more.

It’s free and only takes a minute.

Get Full Site Access

Not ready to join?
Stay current on new articles, podcasts, webinars, courses and more added to the Accendo Reliability website each week.
No membership required to subscribe.

[popup type="" link_text="Get Weekly Email Updates" link_class="button" ]

[/popup]

  • CRE Preparation Notes
  • CRE Prep
  • Reliability Management
  • Probability and Statistics for Reliability
  • Reliability in Design and Development
  • Reliability Modeling and Predictions
  • Reliability Testing
  • Maintainability and Availability
  • Data Collection and Use

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy