Accendo Reliability

Your Reliability Engineering Professional Development Site

  • Home
  • About
    • Contributors
  • Reliability.fm
    • Speaking Of Reliability
    • Rooted in Reliability: The Plant Performance Podcast
    • Quality during Design
    • Way of the Quality Warrior
    • Critical Talks
    • Dare to Know
    • Maintenance Disrupted
    • Metal Conversations
    • The Leadership Connection
    • Practical Reliability Podcast
    • Reliability Matters
    • Reliability it Matters
    • Maintenance Mavericks Podcast
    • Women in Maintenance
    • Accendo Reliability Webinar Series
  • Articles
    • CRE Preparation Notes
    • on Leadership & Career
      • Advanced Engineering Culture
      • Engineering Leadership
      • Managing in the 2000s
      • Product Development and Process Improvement
    • on Maintenance Reliability
      • Aasan Asset Management
      • AI & Predictive Maintenance
      • Asset Management in the Mining Industry
      • CMMS and Reliability
      • Conscious Asset
      • EAM & CMMS
      • Everyday RCM
      • History of Maintenance Management
      • Life Cycle Asset Management
      • Maintenance and Reliability
      • Maintenance Management
      • Plant Maintenance
      • Process Plant Reliability Engineering
      • ReliabilityXperience
      • RCM Blitz®
      • Rob’s Reliability Project
      • The Intelligent Transformer Blog
      • The People Side of Maintenance
      • The Reliability Mindset
    • on Product Reliability
      • Accelerated Reliability
      • Achieving the Benefits of Reliability
      • Apex Ridge
      • Metals Engineering and Product Reliability
      • Musings on Reliability and Maintenance Topics
      • Product Validation
      • Reliability Engineering Insights
      • Reliability in Emerging Technology
    • on Risk & Safety
      • CERM® Risk Insights
      • Equipment Risk and Reliability in Downhole Applications
      • Operational Risk Process Safety
    • on Systems Thinking
      • Communicating with FINESSE
      • The RCA
    • on Tools & Techniques
      • Big Data & Analytics
      • Experimental Design for NPD
      • Innovative Thinking in Reliability and Durability
      • Inside and Beyond HALT
      • Inside FMEA
      • Integral Concepts
      • Learning from Failures
      • Progress in Field Reliability?
      • R for Engineering
      • Reliability Engineering Using Python
      • Reliability Reflections
      • Testing 1 2 3
      • The Manufacturing Academy
  • eBooks
  • Resources
    • Accendo Authors
    • FMEA Resources
    • Feed Forward Publications
    • Openings
    • Books
    • Webinars
    • Journals
    • Higher Education
    • Podcasts
  • Courses
    • 14 Ways to Acquire Reliability Engineering Knowledge
    • Reliability Analysis Methods online course
    • Measurement System Assessment
    • SPC-Process Capability Course
    • Design of Experiments
    • Foundations of RCM online course
    • Quality during Design Journey
    • Reliability Engineering Statistics
    • Quality Engineering Statistics
    • An Introduction to Reliability Engineering
    • Reliability Engineering for Heavy Industry
    • An Introduction to Quality Engineering
    • Process Capability Analysis course
    • Root Cause Analysis and the 8D Corrective Action Process course
    • Return on Investment online course
    • CRE Preparation Online Course
    • Quondam Courses
  • Webinars
    • Upcoming Live Events
  • Calendar
    • Call for Papers Listing
    • Upcoming Webinars
    • Webinar Calendar
  • Login
    • Member Home

by nomtbf Leave a Comment

A Life Data Analysis Challenge

A Life Data Analysis Challenge

old machinery couplingHere is a Challenge: Life Data Analysis

Some years ago a few colleagues compared notes on results of a Weibull analysis. Interesting we all started with the same data and got different results.

After a recent article on the many ways to accomplish data analysis, Larry mentioned that all one needs is shipments and returns to perform field data analysis.

This got me thinking: What are our common methods and sets of results when we perform life data analysis?

The Life Data Analysis Challenge

So, here’s a challenge: Given the data in this life-data-challenge.csv file, perform an analysis to answer two questions:

  1. How many returns should we expect next month?

  2. Is the rate or returns increasing or decreasing?

3 [Bonus question] Based on your analysis and experience, what questions should we answer next?

Here is the data, life-data-challenge.csv

Notes About the Data

It is made up data and kept relatively simple for the purpose of allowing a wide range of analysis approaches. The data represent the time to failure in days. The count of days are from shipment till the day, including weekends and holidays, the customer reported the failure.

The item is a battery powered portable hand drill for use by a home workshop or woodworking enthusiast. In other words, not a contractor. The drill is used sporadically for a wide range of uses and situations around a persons home, office, or workshop.

To keep things very simple there were 1,000 units shipped on one day and the failure data is all from that one day of shipments. Not all units have failed, only 75 have failed.

The data is in one column and not sorted nor in any particular order.

Reporting Your Results

There are two main points in this challenge.

First, please answer the two (three) challenge questions based on your analysis. Provide a summary of your analysis, graphics, charts, or what ever makes sense for us (me and your peers) to understand your results and how to you got them.

Second, please comment on what, if any, assumptions you made for your analysis. For example, if you assume the data is exponentially distributed (please, I really hope not!), list that as an assumption.

Third, I really do have a problem with keeping to two points today, please comment on what additional information you would like to have available, if any, to improve your analysis.

Please add your results to the comments section below, or email them to me (Fred) at fms@nomtbf.com

That is the challenge. Looking forward to your results and analysis.

Thanks for taking part and enjoy.

Filed Under: Articles, NoMTBF

« Is Your Maintenance Program Aligned With The Business?
Capital Asset Management: Setup — part 1 »

Comments

  1. Jurgita Simaityte says

    May 29, 2017 at 12:21 PM

    Hello Fred! I opened data in phone, but I do not see any state indicating sencored or failed units. Which ones and at which day these 75 failed?

    Reply
    • Fred Schenkelberg says

      May 29, 2017 at 12:40 PM

      Hi Jurgita,

      All 75 in the data file have failed. there is no column to indicate censored or failed, as all in the data set have failed. The value is days till failures. The total units shipped in 1,000 thus 1,000 – 75 are right censored.

      hope the helps.

      Cheers,

      Fred

      Reply
  2. Jurgita says

    May 29, 2017 at 11:40 PM

    Ok, now understood, thank you, Fred! One more question: do you have a deadline for this challenge? The date on the top 24th of May, does it mean it finished already? 🙂

    Reply
    • Fred Schenkelberg says

      May 30, 2017 at 6:54 AM

      No deadline – the date is when the post was published. cheers, Fred

      Reply
  3. Oleg Ivanov says

    June 1, 2017 at 5:18 AM

    Hi Fred,

    I think you agree that failure times are enough for statistic engineer but are not enough for reliability engineer. We need know failed part, failure mechanism, the cause of failure.
    Based on this data I can say 75 products from 1000 has some kind of manufacture defect and has failed. Failure time has Weibull distribution (beta=2.5; eta=2000). I think this defect “burned out” and does not appear on the rest of the products.

    Thanks for the interesting question.

    Reply
    • Fred Schenkelberg says

      June 1, 2017 at 6:03 AM

      Hi Oleg, we have very little information concerning the data concerning failure mechanisms, etc. So, based on the analysis of the available data, what questions do you have?

      For your analysis, how did you treat the censored data? What analysis approach did you take? Which software package and what assumptions or settings?

      For example, using Weibull++ and ignoring the 925 right censored points, I get one fit, adding the censored data assuming the last point in the data is the censor point, using rank regression or MLE I get two other answers.

      I have found other software package provide different answers as well.

      So, two questions, which is right and why? Based on your analysis, rather than state conclusions, what questions should one be asking to help make the right conclusions?

      Cheers,

      Fred

      Reply
  4. Ricardo says

    June 22, 2017 at 6:11 AM

    Hi Fred,

    In this example, the operational conditions of one hand drill and another can vary a lot (the item that has failed after 515 days and the item that has failed after 1460 days can have been used in a very different way – load, time cycle, environment…). So first question could be: can we group failures in a certain use pattern? Second question could be related to the failure reporting system (the questions mentioned by Oleg: failure effect? failure mode? potential failure causes? etc.)

    With no more data and from a pure statistical point of view I can share these three approaches:

    A. Parametric estimation approach without taking into account the censored data:
    – Rank Time To Failure data
    – Benard approximation for the time to failure probability
    – Least Squares fit to Weibull (R^2=97.1%): BETA=2.48; ETA=2002
    Question 1 response: 925 units x [1- R(3673+30 days)/R(3673 days)] = 84 expected returns during the next month.
    Question 2 response: BETA >1 –> increasing failure rate –> increasing return rate.

    In this case we do not use the information that 925 units have survived 3673 days and our estimation could be very conservative…

    B. Parametric estimation approach with taking into account 925 right censored data:
    – Rank Time To Failure data
    – Mean order number and Benard approximation for time to failure probability
    – Least Squares to fit to LogNormal (R^2=94.1%): MEAN=9.86; STANDARD DEVIATION=1.32
    Question 1 response: 925 units x [1- R(3673+30 days)/R(3673 days)] = 2 expected returns during the next month.
    Question 2 reponse: Failure rate function is increasing with time –> increasing return rate

    In this case, we have used the censored data information, but they represent a big proportion of the data (>90%). Could our estimation be very optimistic? I guess it could, but I think we should use this information in the analysis.

    C. Your “beloved” in-service MTBF = 1778 days
    – Average failure rate during the period = 1/1778
    Question 1 response: (constant failure rate assumption during next periods) 925 units x 1/1778 x 30 days = 16 expected returns during next month
    Question 2 response: constant failure rate assumption during next periods –> constant return rate

    In this case, assumption shall be checked (if possible…). Here we could not know if our approach seems to be conservative or optimistic as we have make our analysis too simply…

    I am looking forward to listening your thoughts

    Cheers,
    Ricardo

    Reply
    • Fred Schenkelberg says

      June 22, 2017 at 4:26 PM

      Thanks Ricardo for the detailed analysis – The weibull with censored data, which I think is the way to go, although I would use Maximum Likelihood Estimation method given the large number of censored data… Cheers, Fred

      Reply
  5. Adrien says

    June 26, 2017 at 12:00 AM

    Using 2 parameters Weibull, with MLE method, without taking into account censoring:
    beta= 2,47
    eta= 2010

    Rank Regression X or Y methods give around the same results.
    But Komolgorow-Smirnof test is rejected the goodness of fit hypothesis.

    Using 3 parameters Weibull +MLE give the following results:

    Sub-Pop 1
    beta1 = 3,5
    eta1=1248
    p1=0,453443475 (proportion of sub-pop1)

    Sub-Pop 2
    beta2=4,435635
    eta2=2550,594619
    p2=0,546556525 (proportion of sub-pop2)

    Komolgorow-Smirnof test is not rejected the goodness of fit hypothesis.

    1/ Assumption that the end-time is 3673 => failure rate at 3673 is 6,09e-3/d so number of failure for 30 days more is : 6,09e-3*925*30 = 169
    2/ The lower bound on Beta1, at NC=90% =2,53, lower bound on Beta2, at NC=90% =2,85 => so the failure rate is increasing.
    3/ For taking into account censored data, what is the time of good working for all others equipments? What is the maintenance policy : are the failed unit removed or repaired ?

    Reply
    • Fred Schenkelberg says

      June 26, 2017 at 1:07 PM

      Thanks Adrian, which software package (or by hand?) did you use for the analysis? Thanks for taking the challenge. Why consider the regression without the censored data? Does that make sense? cheers, Fred

      Reply
      • Adrien says

        July 3, 2017 at 12:28 AM

        The software is Weibull++ (Reliasoft).
        Censoring is not well taken into account with regression methods, as it used the rank (median rank) instead of the exact time. So with heavy censored (here (1000-75)/1000=92,5%) the results may be wrong.
        MLE used the exact time of censoring.

        Reply
        • Fred Schenkelberg says

          July 3, 2017 at 7:50 AM

          Thanks Adrien – I agree that we need to account for the censored units and do so in a meaningful manner, which mean using MLE regression.

          Cheers,

          Fred

          Reply
          • neel sharavana says

            July 8, 2017 at 2:48 AM

            dear sir..greetings from india..i am an engineer with no reliability background..i am seeking your help and advice..i admire your nomtbf blog

            i work for a firm with several big engineering systems and we have been using MTBF and MTTR with exponential distribution assumption to calculate system reliability etc.

            i need your help sir to move away from MTBF and implement a better study methodology..our systems are used for 10-15 years and have non-repairable LRUs, i.e least replaceable units, which are replaced from spare stocks whenever failure occurs….basically , overall the main systems are repairable but the spares failing are non-repairable….

            we have robust failure reporting system from users on a monthly basis to carry out field reliability and maintainability analysis on field failures/ field performance.the monthly feedback forms have data like number of running hours, cause of failure, spare replaced, repair time, system down time etc sir.

            can you kindly help me sir with a simple system to do standardized field failure data analysis, without too much of maths and too much of reliance on software analysis using tools like RELEX sir.

            i will be very grateful for your kind help sir because of your expertise and knowledge in the field..sincere regards.. mr. sharavana gowda from mumbai, india

  6. Mark Powell says

    January 2, 2019 at 10:45 PM

    Fred,

    I am late to this, but it looks like fun.

    I don’t understand how anybody could get any answers though. I am seeing plug and chug without problem analysis.

    You said 1,000 were shipped on a single day. When was that day relative to the date you reported the data (I am presuming for question one that you are looking for the next 30 days beyond this report date, please confirm). Obviously it has to be at least as many days as the largest failure time, but if the ship date were 25,000 days before the report date, or the ship date was exactly the number of days of the largest failure, it makes a huge difference.

    These are very seemingly innocuous and ordinary questions. But they are not trivial to answer properly.

    Mark Powell

    Reply
    • Fred Schenkelberg says

      January 3, 2019 at 5:33 AM

      Hi Mark, good questions. The longest time to failure is just about 10 years, so let’s say the data was provided to you at 10 years plus the first month. If there were three leaps years than we’d be at the 10 year mark.

      Or, you can assume the report is failure truncated and you have as many failure points as are know and no additional time as elapsed.

      I agree is all failures stopped over the next 10 years or so, it would change the analysis, yet please make your assumptions clear, as with any analysis.

      Cheers,

      Fred

      Reply

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

[popup type="" link_text="Get Weekly Email Updates" link_class="button" ]

[/popup]

The Accendo Reliablity logo of a sun face in circuit

Please login to have full access.




Lost Password? Click here to have it emailed to you.

Not already a member? It's free and takes only a moment to create an account with your email only.

Join

Your membership brings you all these free resources:

  • Live, monthly reliability webinars & recordings
  • eBooks: Finding Value and Reliability Maturity
  • How To articles & insights
  • Podcasts & additional information within podcast show notes
  • Podcast suggestion box to send us a question or topic for a future episode
  • Course (some with a fee)
  • Largest reliability events calendar
  • Course on a range of topics - coming soon
  • Master reliability classes - coming soon
  • Basic tutorial articles - coming soon
  • With more in the works just for members
Speaking of Reliability podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Dare to Know podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Accendo Reliability Webinar Series podcast logo

Subscribe and enjoy every episode

RSS
iTunes
Stitcher

Join Accendo

Receive information and updates about podcasts and many other resources offered by Accendo Reliability by becoming a member.

It’s free and only takes a minute.

Join Today

Recent Articles

  • test
  • test
  • test
  • Your Most Important Business Equation
  • Your Suppliers Can Be a Risk to Your Project

© 2025 FMS Reliability · Privacy Policy · Terms of Service · Cookies Policy