Skip to Tutorial Content

Introduction

Welcome to the tutorial on Life Data Analysis! In this tutorial, you will learn about the basics of life data analysis and how it applies to various fields, such as engineering, manufacturing, and many others.

Learning Objectives

By the end of this module, learners will be able to:

  • Describe the purpose of Weibull analysis in reliability engineering.
  • Differentiate between types of data censoring, including right-censored and interval-censored data.
  • Differentiate between different Weibull models (2-parameter Weibull, 3-parameter Weibull, and Weibayes).
  • Apply Median Rank Regression (MRR) and Maximum Likelihood Estimation (MLE) estimation methods to sample datasets.
  • Interpret results using plotting methods, including probability plots and contour plots.

What is Life Data Analysis?

Life data analysis is the study of how systems function over time. Life data helps explain how long a system will last, when it will likely fail, and how often it will need maintenance. For example, in the manufacturing industry, life data analysis can be used to understand the lifespan of machines and equipment, helping to optimize maintenance schedules and reduce downtime.

Why is Life Data Analysis Important?

Understanding life data analysis helps in predicting the lifespan and maintenance needs of various systems, thereby improving reliability and efficiency. By collecting and modeling life data, we can make predictions about how long machines will last or how long before they will need to be repaired or replaced.

Life Distributions

The Weibull Distribution

The Weibull distribution is a continuous probability distribution used in life data analysis and many other fields. Life data analysis is often referred to as Weibull analysis because the Weibull distribution is commonly used to model life data. While other distributions can be used to model life data, this tutorial focuses on the Weibull distribution because of its wide application.

Why the Weibull Distribution?

The Weibull is a flexible distribution that can fit many different types of data. By adjusting the distribution parameters, the Weibull can mimic other distributions such as the normal, exponential, and log-normal distributions.

Mimicking Other Distributions

To illustrate, take the following normal distribution. The normal distribution is another common distribution used in statistics. The normal is symmetric and characterized by its mean and standard deviation.

The Weibull can mimic the normal distribution by adjusting the shape and scale parameters. Adjust the scale parameter in the slider until the Weibull below looks similar to the Normal above.

Hint: Set Eta equal to 5 as a starting point.

By adjusting the scale parameter, the Weibull can closely resemble the normal distribution.

The Cumulative Distribution Function

In life data analysis, the cumulative distribution function (CDF) describes the probability of survival (or failure) prior to a certain time t. In other words, the CDF gives the probability that a unit will survive (or fail) up to time t.

For the Weibull distribution, the CDF is given by:

\[ R(t) = 1 - F(t) = e^{-(t/\eta)^\beta} \]

Where R(t) is the cumulative probability of survival prior to time t, F(t) is the cumulative probability of failure prior to time t, \(\beta\) is the shape parameter and \(\eta\) is the scale parameter.

Notice how R(t) and F(t) are complementary probabilities, meaning that they add up to 1. For example, if the probability of failure prior to time t is 30%, then the probability of survival prior to time t is 70%.

For the Weibull distribution, \(\beta\) has a physical interpretation related to the failure rate, or the rate at which failures occur over time.

  • \(\beta\) < 1 - the failure rate is decreasing
  • \(\beta\) = 1 - the failure rate is constant
  • \(\beta\) > 1 - the failure rate is increasing

By estimating \(\beta\) from life data, we can gain insights into the failure behavior of the system being analyzed.

Quiz: The Weibull Parameters

The Weibull Parameters

Beta

Beta is the shape parameter of the Weibull distribution. Changing Beta changes the shape of the cumulative distribution function. To illustrate, use the slider below to adjust the shape of the Weibull.

As you adjust Beta, notice how the shape of the Weibull changes. Increasing Beta makes the curve steeper, indicating an increasing failure rate. Decreasing Beta makes the curve less steep, indicating a decreasing failure rate.

Eta

Eta is the scale parameter of the Weibull distribution. Changing Eta changes the scale of the cumulative distribution function. To illustrate, use the slider below to adjust the scale of the Weibull.

As you adjust Eta, notice how the scale of the Weibull changes. Increasing Eta shifts the curve to the right, indicating a longer lifespan. Decreasing Eta shifts the curve to the left, indicating a shorter lifespan.

WeibullR

Getting Started with WeibullR

For this tutorial, we will use WeibullR: an R package for Weibull analysis.

First, check if WeibullR is installed in R and install if not.

pak::pkg_install("WeibullR")

As an example of Weibull analysis, suppose a factory has 5 machines that fail at different times. The factory wants to understand the reliability of the machines over time and predict when future failures may occur. The factory collects the failure data and uses Weibull analysis to model the reliability of the machines. The failure times are 30, 49, 82, 90, and 96 respectively.

To run Weibull analysis in R, first create a vector with the failure data, then use the MLEw2p function with the failure data to fit a Weibull to the data. The argument bounds=TRUE adds confidence bounds to the fit, and show=TRUE displays the results.

failures<-c(30, 49, 82, 90, 96)
fit<-MLEw2p(failures, bounds=TRUE, show=TRUE)

The Weibull Probability Plot

The Weibull Probability Plot above shows the time (or equivalent) on the horizontal axis and the cumulative probability of failure or unreliability on the vertical axis. We know from the cumulative distribution function that the complement of the failure probability is the survival probability or reliability. For example, if the unreliability at time 50 is 20%, then the reliability at time 50 is 100%-20% = 80%.

Note that the horizontal axis on a probability plot is log scale, and the vertical axis is log(log(1-p)) scale, where p is the cumulative probability of failure. By using this scale, the resulting plot of the Weibull data appears linear.

Quiz: Interpreting the Probability Plot

Exercise: Fit a Weibull Distribution

A batch of industrial pumps failed at the following times (hours): 120, 205, 310, 445, 590, 780. Use MLEw2p() to fit a 2-parameter Weibull distribution and display the probability plot.

failures <- c(120, 205, 310, 445, 590, 780)
# Use MLEw2p() with show=TRUE to plot the result
# MLEw2p(failures, show = TRUE)
failures <- c(120, 205, 310, 445, 590, 780)
fit <- MLEw2p(failures, show = TRUE)

Data Censoring

In life data analysis, data is often censored. Censoring occurs when the exact time to failure is not known for all units. There are several types of censoring, including right censoring and interval censoring.

Right Censored Data

Right censored data includes suspensions or units that have operated for a period of time without failure. Suspensions are common in life data analysis because not all units fail during the analysis period.

To illustrate right censored data, suppose that the factory in the previous example has 3 additional machines that did not fail during the analysis period. The machines were suspended (i.e. removed from service) at times 100, 45, and 10 respectively.

To add suspensions to the previous example, create a vector with the suspension data and add it to the MLEw2p function.

suspensions<-c(100, 45, 10)
fit<-MLEw2p(failures, suspensions, bounds=TRUE, show=TRUE)

Note how the fit has changed slightly after adding suspensions. The fit now has a lower beta and higher eta.

Exercise: Right-Censored Fit

Four compressors failed at times 50, 85, 130, and 200 hours. Two additional units were removed from service (suspended) at 250 hours each without failing. Fit a Weibull model using MLEw2p() with suspensions.

failures <- c(50, 85, 130, 200)
suspensions <- c(250, 250)
# Pass both failures and suspensions to MLEw2p()
# MLEw2p(failures, suspensions, show = TRUE)
failures <- c(50, 85, 130, 200)
suspensions <- c(250, 250)
fit <- MLEw2p(failures, suspensions, show = TRUE)

Interval Censored Data

Interval censored data is data where the exact failure times are unknown, but are known to have occurred within certain intervals. An example of interval censored data is inspection data, where parts or machines are inspected periodically for failures. If a failure is detected during an inspection, the exact failure time is unknown, but it is known to have occurred between the last inspection time and the current inspection time.

To illustrate interval censored data, let’s look at an example from Silkworth, 2020. The data consists of part cracks detected during periodic inspections. The left and right columns represent the inspection intervals, and the qty column represents the number of cracks detected in each interval.

inspection_data<-data.frame(left=c(0, 6.12, 19.92, 29.64, 35.4, 39.72, 45.32, 52.32),
                            right=c(6.12, 19.92, 29.64, 35.4, 39.72, 45.32, 52.32, 63.48),
                            qty=c(5, 16, 12, 18, 18, 2, 6, 17))

A number of units did not fail during the inspection period. These units are considered suspensions and are added to the data set as a single entry with the time equal to the last inspection time, event equal to 0 (suspension), and qty equal to the number of suspensions.

suspensions<-data.frame(time=63.48, event=0, qty=73)

To run the interval censored example in WeibullR, use the wblr function with the suspension data and the interval argument to specify the inspection data. Then use the wblr.fit function with the method.fit="mle" argument to fit the model using maximum likelihood estimation. Next, use the wblr.conf function with the method.conf="fm" argument to add confidence bounds using the Fisher matrix method. The col and lty arguments specify the color and line type of the fit and confidence bounds respectively. Use the plot function to create the probability plot.

obj1<-wblr(suspensions, interval = inspection_data)
obj1<-wblr.fit(obj1, method.fit="mle", col="red") 
obj1<-wblr.conf(obj1, method.conf="fm", lty=2)
plot(obj1)

Instead of points, the interval censored data is shown as black horizontal lines representing the intervals where failures occurred. The solid red line is the model fit and the red dashed lines are the confidence bounds.

Grouped Data

Life data is often compiled into groups (i.e. a group of machines failed or survived at a certain time). For example, in the interval censored example above, the inspection data is grouped into intervals with multiple failures in each interval. Likewise, the suspensions are grouped into a single entry with 73 suspensions at the last inspection time.

Let’s look at the inspection data again. The knitr::kable function below displays the inspection data in a table format, and the head function shows the first few rows of the data.

knitr::kable(
  head(inspection_data)
)
left right qty
0.00 6.12 5
6.12 19.92 16
19.92 29.64 12
29.64 35.40 18
35.40 39.72 18
39.72 45.32 2

The qty column indicates how many units failed in each interval. For example, 5 units failed between time 0 and 6.12, 16 units failed between time 6.12 and 19.92, and so on.

Quiz: Interval Censored Data

Parameter Estimation Methods

Parameter estimation methods are used to estimate the parameters of a probability distribution (e.g. Beta and Eta for the Weibull distribution) based on the observed data. The two most common methods are Maximum Likelihood Estimation (MLE) and Median Rank Regression (MRR).

Maximum Likelihood vs Rank Regression

  • Median Rank Regression (MRR) is based on estimating the parameters of the distribution by minimizing the sum of squared errors between the observed and predicted values. The MRR method is also known as the least squares method.

  • Maximum Likelihood Estimation (MLE) is based on estimating the parameters of the distribution by maximizing the likelihood function. The MLE method finds the parameters that make the observed data most probable.

This tutorial does not go into the mathematical details of MRR and MLE, but both methods are widely used in life data analysis. WeibullR includes functions for both methods, such as the MRRw2p, MRRw3p functions for MRR and the MLEw2p, MLEw3p functions for MLE.

Let’s rerun the right censored example and compare the MRR and MLE methods using the wblr function. In the first step, fit the MRR and model using the wblr.fit function with the method.fit argument set to “rr” for MRR. In the second step, fit the MLE model using the method.fit argument set to “mle” for MLE. Finally, use the plot.wblr function to create a multi-plot comparing both models.

MRRfit <- wblr.fit(wblr(failures, suspensions, col="blue"), method.fit="rr")
MLEfit <- wblr.fit(wblr(failures, suspensions, col="red"), method.fit="mle")
plot.wblr(list(MRRfit, MLEfit))

The blue model is the MRR fit and the red model is the MLE fit. While both models fit the data well, there are some slight differences in the parameters and unreliability.

Quiz: MRR vs MLE

Goodness of Fit

After fitting a Weibull model, it is important to assess how well the distribution fits the data. The Anderson-Darling (AD) statistic is a widely used goodness-of-fit metric in life data analysis.

The AD statistic measures the distance between the fitted cumulative distribution function (CDF) and the empirical distribution of the data. A lower AD value indicates a closer fit between the model and the data. WeibullR automatically computes and displays the AD statistic in the legend of every probability plot.

As an informal rule of thumb:

AD Value Interpretation
< 0.3 Good fit
0.3 – 0.6 Acceptable fit
> 0.6 Poor fit

The AD statistic is also useful for comparing fitting methods. For example, fitting the same data with both MRR and MLE allows you to select the method that produces a lower AD — that is, the better-fitting model for your data.

Quiz: Goodness of Fit

Exercise: Compare MRR and MLE Goodness of Fit

Fit the following failure data using both MRR and MLE methods and compare their Anderson-Darling statistics on a combined probability plot.

failures <- c(15, 28, 44, 61, 83, 110, 145)
# Create two wblr objects with different fitting methods, then plot them together
# obj_mrr <- wblr.conf(wblr.fit(wblr(failures, col = "blue"), method.fit = "rr"), method.conf = "fm", lty = 2)
# obj_mle <- wblr.conf(wblr.fit(wblr(failures, col = "red"),  method.fit = "mle"), method.conf = "fm", lty = 2)
# plot.wblr(list(obj_mrr, obj_mle))
failures <- c(15, 28, 44, 61, 83, 110, 145)
obj_mrr <- wblr.conf(wblr.fit(wblr(failures, col = "blue"), method.fit = "rr"),  method.conf = "fm", lty = 2)
obj_mle <- wblr.conf(wblr.fit(wblr(failures, col = "red"),  method.fit = "mle"), method.conf = "fm", lty = 2)
plot.wblr(list(obj_mrr, obj_mle))

Other Weibull Models

The WeiBayes Model

A WeiBayes or one parameter (1P) Weibull has a fixed \(\beta\) or shape parameter based on prior knowledge or experience. Weibayes is a Bayesian approach to Weibull analysis, where prior knowledge is combined with observed data to estimate the parameters of the Weibull distribution. Weibayes is useful when there is limited failure data or when the data does not fit a standard Weibull model well.

As an example, let’s rerun the right censored example of a machine that failed at times 30, 49, 82, 90, and 96. In this case, the data is limited, so a WeiBayes model may be appropriate. First, recreate the failure and suspension data. Then fit the WeiBayes model while assuming a Beta of 2 and plot the results. In the code below, the method.fit argument specifies the WeiBayes model, and the weibayes.beta argument specifies the fixed Beta value.

failures<-c(30, 49, 82, 90, 96)
suspensions<-c(100, 45, 10)
obj <- wblr.fit(wblr(failures, suspensions, col="blue"), method.fit="weibayes", weibayes.beta=2)
plot(obj)

Note that confidence bounds are not shown for the WeiBayes model. Confidence bounds for the WeiBayes model are not well defined because the shape parameter is fixed.

The 3-Parameter Weibull Model

The 3P Weibull has an additional parameter t0 to represent a failure free period. The failure free period is the time before which no failures can occur. Often parts or machines must age, wear, fatigue, etc. before they can actually fail, and the 3P Weibull can model this behavior. The cumulative distribution function for the 3P Weibull is given below:

\[ R(t) = 1 - F(t) = e^-((t-t~0~)/\eta)^\beta \] Where t0 is the failure free period, \(\beta\) is the shape parameter, and \(\eta\) is the scale parameter.

Here is an example from “The New Weibull Handbook” by Robert Abernethy (2004). This example consists of 25 failure times from a machine that has a failure free period. Let’s run this example as a 3P Weibull. First, create a vector of failure data. Then fit the 3P Weibull model and plot the results. The dist="weibull3p" argument specifies the 3P Weibull model.

failures<-c(3.46623, 3.732711, 4.052996, 4.628703, 4.8157, 5.84517, 5.888313, 5.892967,
            8.168362, 10.02799, 10.06062, 10.49785, 11.11493, 11.87369, 12.21122, 12.51854, 
            12.91357, 18.04246, 18.20712, 19.57305, 21.20873, 30.03917, 34.88001, 36.87355,
            53.91168)
fit<- wblr.conf(wblr.fit(wblr(failures), dist="weibull3p"), col="darkgreen")
plot(fit)

Note that the 3P Weibull model has a failure free period of approximately 3.3 time units. The model appears curved at lower probabilities, which is typical of a 3P Weibull. The curvature is due to the failure free period, which shifts the distribution to the right.

Quiz: Other Weibull Models

Multi-Plots

Multi-Plots

Multi-plots are useful for comparing multiple Weibull models on the same plot. In this way, you can easily see the differences between the models and how they fit the data. To build a multi-plot in WeibullR, create multiple wblr objects and add them to a list. Then use the plot.wblr function to create the multi-plot.

First, recreate the failure data from the first example of a machine that failed at times 30, 49, 82, 90, and 96.

failures<-c(30, 49, 82, 90, 96)
obj1<-wblr.fit(wblr(failures), col="red")

Next, recreate the failure and suspension data from the right censored example of a machine that failed at times 30, 49, 82, 90, and 96 with suspensions at times 100, 45, and 10.

failures<-c(30, 49, 82, 90, 96)
suspensions<-c(100, 45, 10)
obj2<-wblr.fit(wblr(failures, suspensions, col="purple"))

Finally, add the 2 wblr objects to a list and plot both objects in a single chart.

plot.wblr(list(obj1, obj2))

The red model is the fit without suspensions and the purple model is the fit with suspensions. Note how the fit with suspensions has a slightly lower beta and higher eta.

Quiz: Multi-Plots

Competing Failure Modes

A part or a system may fail in different ways. A failure mode is a specific way in which a part or machine can fail. For example, a machine may fail due to corrosion, fatigue, or overload. Each failure mode may have a different distribution of time to failure. Competing failure modes occur when multiple failure modes are present in the same data set.

When competing failure modes are present, a basic Weibull model tends to fit the data poorly. This is because the basic Weibull assumes that all failures are due to a single failure mode.

To illustrate, let’s simulate some data with 3 different failure modes, where each failure mode has a different beta and eta. The data.frame below contains 120 observations, where 20 observations are failures and 100 observations are suspensions. The failure_mode column indicates the failure mode for each observation.

set.seed(123)
data <- data.frame(
  time = c(
    rweibull(5, 0.5, 20), # Simulate times for Failure Mode A
    rweibull(10, 1, 10), # Simulate times for Failure Mode B
    rweibull(5, 2, 5),  # Simulate times for Failure Mode C
    rweibull(100, 2, 10) # Simulate times for Suspensions
    ),
  event = c(
    rep(1, 20), # Label Failures
    rep(0, 100) # Label Suspensions
    ),
  failure_mode = c(
    rep("A", 5), # Label Failure Mode A
    rep("B", 10), # Label Failure Mode B
    rep("C", 5), # Label Failure Mode C
    rep("", 100)) # No label for Suspensions
  )

Next, fit a basic Weibull model to the combined data and plot the results.

obj <- wblr.conf(wblr.fit(wblr(data)))
plot(obj, col="darkgreen", is.plot.legend = FALSE)

When the data is combined, the overall fit has a beta of approximately 1, which indicates random failures. However, we know that the data actually contains 3 different failure modes.

Now let’s create a separate data set for each failure mode. The code below creates 3 copies of the original data set and sets the event column to 0 for all observations that do not belong to the respective failure mode.

dat1 <- data
dat2 <- data
dat3 <- data
dat1$event[dat1$failure_mode != "A"] <- 0
dat2$event[dat2$failure_mode != "B"] <- 0
dat3$event[dat3$failure_mode != "C"] <- 0

Finally, fit a wblr object for each failure mode and create a multi-plot. The is.plot.legend=FALSE argument removes the legend from the plot.

obj1 <- wblr.fit(wblr(dat1, col="blue"))
obj2 <- wblr.fit(wblr(dat2, col="red"))
obj3 <- wblr.fit(wblr(dat3, col="orange"))
plot.wblr(list(obj1, obj2, obj3), is.plot.legend = FALSE)

Note how each failure mode has a different beta and eta. Failure mode A (blue) has a beta less than 1, which means the failure rate is decreasing. Failure mode B (red) has a beta greater than 1, which means that the failure rate is increasing. Failure mode C (orange) has a beta close to 1, which means that the failure rate is approximately constant.

Contour Plots

Another way to compare Weibull models is with contour plots. A contour plot is a graphical representation of the relationship between the Weibull parameters Beta and Eta.

Once again, let’s look at the first example of a machine that failed at times 30, 49, 82, 90, and 96. First, recreate the failure data. Then fit the Weibull model and build the contour plot at a probability level of 90%. The wblr.conf function with the method="lrb" argument computes the likelihood ratio bounds for the contour plot, and the plot_contour function creates the contour plot. The CL=0.9 argument specifies a probability level of 90%.

failures<-c(30, 49, 82, 90, 96)
obj <- wblr.conf(wblr.fit(wblr(failures, col="blue"), method="mle"), method="lrb")
plot_contour(obj, CL=0.9)

The Weibull contour plot shows the confidence region for the Weibull parameters Beta and Eta at a probability level of 90%. The x-axis represents Eta, the y-axis represents Beta, and the contour lines represent the confidence levels.

Comparing Contour Plots

To compare multiple Weibull models, we can build contour plots for each model and overlay them on the same chart. If the models are similar, then the contours will overlap. If the models are different, then the contours will not overlap.

To visualize this concept, let’s revisit the previous competing failure mode example. First, recreate the failure data with 3 different failure modes. Then create a separate wblr object for each failure mode and build the contour plot for each model at a probability level of 90%. The contours are then overlaid on the same chart. Note the use of the xlim argument to set the x-axis limits for better visualization.

obj1 <- wblr.conf(wblr.fit(wblr(dat1, col="blue"), method="mle"), method="lrb")
obj2 <- wblr.conf(wblr.fit(wblr(dat2, col="red"), method="mle"), method="lrb")
obj3 <- wblr.conf(wblr.fit(wblr(dat3, col="orange"), method="mle"), method="lrb")
plot_contour(list(obj1, obj2, obj3), CL=0.9, xlim = c(1, 1500))

The blue, red, and orange contours represent the 3 different failure modes. Note how the contours overlap, which indicates that the failure modes are not significantly different. If the contours did not overlap, then the failure modes would be significantly different.

Quiz: Contour Plots

Summary

Congratulations on completing the tutorial on life data analysis!

In this tutorial, we’ve introduced the concepts of life data analysis and reliability. You have learned to calculate reliability using the Weibull distribution and visualize life data.

To Get Help

To get help with a specific function in WeibullR, type a question mark before the function name.

?MRRw2p

For more help with the WeibullR package.

help(package="WeibullR")

Additional Resources

References

  • Abernethy, R.B. (2004) The New Weibull Handbook. Fifth Edition.

  • Aden-Buie G, Schloerke B, Allaire J, Rossell Hayes A (2023). learnr: Interactive Tutorials for R. https://rstudio.github.io/learnr/, https://github.com/rstudio/learnr.

  • Silkworth, David. (2020). WeibullR: An R Package for Weibull Analysis for Reliability Engineers. 43-53. https://doi.org/10.35566/isdsa2019c3.

  • Silkworth D, Symynck J (2022). WeibullR: Weibull Analysis for Reliability Engineering. R package version 1.2.1, https://CRAN.R-project.org/package=WeibullR.

An Introduction to Life Data Analysis