Introduction
Welcome to the tutorial on Repairable Systems Analysis! In this tutorial, we will explore models and methods used to analyze systems that are repaired and returned to service after each failure. These methods are widely used in fleet management, industrial maintenance, and infrastructure planning.
Unlike life data analysis, which focuses on the time to first failure of a non-repairable component, repairable systems analysis models the entire history of recurring failures on a single system or fleet. The goal is to characterize the failure process, detect trends, and predict future failure behavior.
Learning Objectives
By the end of this tutorial, you will be able to:
- Define the distinction between repairable and non-repairable systems.
- Explain the Homogeneous Poisson Process (HPP) and Non-Homogeneous Poisson Process (NHPP) as failure process models.
- Describe the Power Law Process and its parameters.
- Fit and interpret the Crow-AMSAA model using
nhpp()from ReliaGrowR. - Apply the Piecewise NHPP model to detect changes in failure behavior.
- Compute fleet-level exposure using
exposure()from ReliaGrowR. - Estimate the Mean Cumulative Function (MCF) using
mcf()from ReliaGrowR. - Interpret model parameters to characterize deterioration, stability, or improvement.
- Use R to predict future failure counts and recurrence rates.
Repairable vs Non-Repairable Systems
Before applying any model, it is essential to classify the system correctly.
A non-repairable system is discarded or replaced with a new, statistically identical unit after each failure. The unit of analysis is the time to first failure, and Weibull analysis applies. Examples include ball bearings, fuses, and light bulbs.
A repairable system is restored to operating condition after each failure and continues accumulating operating time. The unit of analysis is the sequence of inter-failure times or cumulative failure counts over the system’s life. Examples include diesel generators, industrial pumps, and aircraft engines.
Two idealized repair assumptions are common:
- Perfect repair (“as-good-as-new”): The system is restored to the same condition as a new unit. This is modeled by a renewal process.
- Minimal repair (“as-bad-as-old”): The system is restored to the same condition it was in just before the failure — the failure is fixed, but nothing else changes. This is modeled by a Non-Homogeneous Poisson Process (NHPP).
Most field maintenance data is best described by the minimal repair assumption, making NHPP the standard framework for repairable systems analysis.
The recurrence rate (intensity function) \(\rho(t)\) describes the instantaneous rate of failure occurrence at time \(t\):
\[\rho(t) = \frac{dE[N(t)]}{dt}\]
where \(E[N(t)]\) is the expected cumulative number of failures by time \(t\).
Failure Processes: HPP and NHPP
Homogeneous Poisson Process (HPP)
The Homogeneous Poisson Process assumes a constant failure rate \(\lambda\) throughout the system’s life:
\[E[N(t)] = \lambda t \qquad \rho(t) = \lambda\]
The mean time between failures is simply \(\text{MTBF} = 1/\lambda\). The HPP is appropriate when failures occur at random, with no trend toward improvement or deterioration.
Non-Homogeneous Poisson Process (NHPP)
The Non-Homogeneous Poisson Process allows the intensity function to change with time:
\[E[N(t)] = \Lambda(t) \qquad \rho(t) = \frac{d\Lambda(t)}{dt}\]
When \(\rho(t)\) increases with time the system is deteriorating; when it decreases the system is improving. The Power Law Process is the most widely used NHPP for repairable systems.
Use the controls below to explore how the shape of the NHPP intensity function changes with the beta parameter, compared to a constant HPP rate.
When \(\beta < 1\) the NHPP intensity decreases over time (improving system). When \(\beta = 1\) the NHPP reduces to the HPP (constant rate). When \(\beta > 1\) the intensity increases (aging or deteriorating system).
The Power Law Process
The Power Law Process is an NHPP whose mean value function (expected cumulative failures) follows a power law:
\[E[N(t)] = \lambda t^{\beta}\]
The corresponding intensity function is:
\[\rho(t) = \lambda \beta t^{\beta - 1}\]
Parameter interpretation:
- \(\lambda > 0\) — scale parameter; governs the overall magnitude of the failure rate.
- \(\beta\) — shape parameter:
- \(\beta < 1\): failure intensity is decreasing (system improving).
- \(\beta = 1\): constant intensity, equivalent to HPP.
- \(\beta > 1\): failure intensity is increasing (system deteriorating or aging).
The instantaneous MTBF at time \(t\) is the reciprocal of the intensity:
\[\text{MTBF}(t) = \frac{1}{\rho(t)} = \frac{1}{\lambda \beta t^{\beta - 1}}\]
Given \(n\) failures observed at times \(t_1 < t_2 < \cdots < t_n\) over a total operating time \(T\), the maximum likelihood estimates are:
\[\hat{\beta} = \frac{n}{\displaystyle\sum_{i=1}^{n} \ln(T/t_i)} \qquad \hat{\lambda} = \frac{n}{T^{\hat{\beta}}}\]
NHPP Analysis
The nhpp() function fits the Power Law Process
(Crow-AMSAA model) to cumulative failure data by maximum likelihood,
returning estimates of \(\lambda\) and
\(\beta\). The fitted mean value
function is plotted against cumulative time:
\[E[N(t)] = \hat{\lambda}\, t^{\hat{\beta}}\]
A curve that bends downward (concave) on a cumulative failures vs. time plot indicates \(\hat{\beta} < 1\) — the failure rate is falling. A curve that bends upward (convex) indicates \(\hat{\beta} > 1\) — the failure rate is rising.
The nhpp() function accepts:
time— a numeric vector of cumulative event times.event— an optional numeric vector of event counts at each time (ifNULL, each time is treated as a single event).
The following data represent failure records from a fleet of industrial pumps recorded at 10 inspection intervals.
times <- c(500, 1200, 2100, 3300, 4800, 6500, 8400, 10500, 12800, 15000)
failures <- c(3, 4, 4, 5, 4, 3, 3, 3, 2, 2)
fit <- nhpp(time = times, event = failures)
plot(fit,
main = "Crow-AMSAA Model: Industrial Pump Fleet",
xlab = "Cumulative Operating Hours",
ylab = "Cumulative Failures")

The fitted curve shape directly reveals the failure trend: a concave curve (\(\hat{\beta} < 1\)) means each successive failure takes longer to arrive — the system is improving.
Exercise: Fit a Crow-AMSAA Model
A compressor fleet recorded cumulative operating hours and failure
counts below. Use nhpp() with method = "LS" to
fit a Power Law model and plot the result.
times <- c(1000, 2500, 4000, 5500, 8000)
failures <- c(2, 3, 4, 5, 5)
# Pass time and event to nhpp(), then plot the result
# fit <- nhpp(time = times, event = failures, method = "LS")
# plot(fit)
times <- c(1000, 2500, 4000, 5500, 8000)
failures <- c(2, 3, 4, 5, 5)
fit <- nhpp(time = times, event = failures, method = "LS")
plot(fit, main = "Crow-AMSAA: Compressor Fleet",
xlab = "Cumulative Hours", ylab = "Cumulative Failures")
Piecewise NHPP
Sometimes a system’s failure behavior changes at a specific point in its operating life, for example after a major overhaul, a design modification, or a change in operating environment. The Piecewise NHPP model fits a separate Power Law to each segment of the time axis, separated by one or more breakpoints \(\tau\):
\[E[N(t)] = \begin{cases} \lambda_1 t^{\beta_1} & 0 < t \leq \tau \\ E[N(\tau)] + \lambda_2 (t - \tau)^{\beta_2} & t > \tau \end{cases}\]
Each segment has its own \(\lambda_i\) and \(\beta_i\), allowing the model to capture, for example, a deteriorating early period followed by improvement after an overhaul.
Pass a known breakpoint time as the breaks argument to
nhpp(). The function will fit a separate Power Law to each
segment.
# Extended pump data: fleet underwent a major overhaul at hour 8000
times2 <- c(500, 1200, 2100, 3300, 4800, 6500, 8400, 9200, 10100, 11300,
12800, 14500, 16400, 18500, 20800)
failures2 <- c(3, 4, 4, 5, 6, 6, 5, 3, 3, 2, 2, 2, 2, 1, 1)
fit_pw <- nhpp(time = times2, event = failures2, breaks = 8000, method = "LS")
plot(fit_pw,
main = "Piecewise NHPP: Fleet Overhaul at Hour 8000",
xlab = "Cumulative Operating Hours",
ylab = "Cumulative Failures")

The first segment (before the estimated breakpoint) has a steeper slope (\(\hat{\beta}_1 > 1\), increasing failure rate), while the second segment (post-overhaul) shows a flatter or downward-bending curve, confirming the overhaul was effective in reducing the failure rate.
Note that the breaks argument serves as an initial
estimate for the breakpoint location. The segmented
algorithm optimizes the breakpoint position from the data, so the fitted
breakpoint in the plot may differ slightly from the value passed to
breaks. In the example above, the fitted breakpoint is
estimated near hour 9,820, close to the known overhaul time of
8,000.
Exposure Analysis
When analyzing a fleet of systems rather than a single system, individual units enter and leave observation at different times. Exposure is the total operating time accumulated across all systems that are currently at risk. It answers the question: “How much system-time has been observed at each point in calendar time?”
The exposure() function computes:
- Cumulative exposure — total system-time at risk up to each event time.
- Number of systems at risk — how many units were under observation at each event time.
- Event rate — cumulative events divided by cumulative exposure, providing a fleet-wide failure rate estimate.
The function accepts per-event records with three columns:
id— system identifier.time— cumulative event or end-of-observation time for that system.event— 1 for a failure event, 0 for censoring (end of observation without a failure).
The dataset below records individual failure events for a fleet of five industrial pumps, each observed for 3,000 hours.
pump_data <- data.frame(
id = c(1, 1, 1, 2, 2, 2, 3, 3, 4, 4, 4, 4, 5, 5),
time = c(310, 850, 1620, 420, 1050, 2100, 580, 1890, 240, 710, 1380, 2400, 530, 1740),
event = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
)
exp_result <- exposure(id = pump_data$id, time = pump_data$time,
event = pump_data$event)
plot(exp_result)

The exposure plot shows how the total observed system-time accumulates across the fleet. The event rate curve reveals whether the fleet-wide failure rate is increasing, stable, or decreasing over time.
Mean Cumulative Function
The Mean Cumulative Function (MCF) is a non-parametric estimate of the expected cumulative number of failures per system as a function of time. It is the repairable-systems analogue of the Kaplan-Meier estimator in survival analysis: it makes no distributional assumptions about the failure process and can accommodate systems with different observation periods.
The MCF at time \(t\) is estimated using the Nelson-Aalen estimator:
\[\hat{M}(t) = \sum_{t_j \leq t} \frac{d_j}{n_j}\]
where \(d_j\) is the number of failures at time \(t_j\) and \(n_j\) is the number of systems still under observation at \(t_j\). The MCF provides a visual summary of the recurrence trend without requiring a parametric model.
The mcf() function accepts the same per-event data
format as exposure(). Passing end_time ensures
that systems remaining under observation beyond their last failure are
correctly kept in the risk set.
end_times <- c("1" = 3000, "2" = 3000, "3" = 3000, "4" = 3000, "5" = 3000)
mcf_result <- mcf(id = pump_data$id, time = pump_data$time,
event = pump_data$event,
end_time = end_times)
plot(mcf_result,
main = "Mean Cumulative Function: Industrial Pump Fleet",
xlab = "Operating Hours",
ylab = "Expected Cumulative Failures per System")

The slope of the MCF curve reveals the recurrence trend: a steepening slope indicates an increasing failure rate; a flattening slope indicates improvement. Confidence bounds (shown by default) reflect uncertainty in the estimate.
Exercise: Mean Cumulative Function
Three systems were observed with the following failure times (hours).
Each system was observed until hour 2000. Compute and plot the MCF using
mcf().
id <- c(rep("1", 3), rep("2", 4), rep("3", 2))
time <- c(400, 900, 1500, 300, 700, 1200, 1800, 600, 1400)
event <- rep(1, 9)
end_times <- c("1" = 2000, "2" = 2000, "3" = 2000)
# Use mcf() with the id, time, event, and end_time arguments, then plot
# fit <- mcf(id = id, time = time, event = event, end_time = end_times)
# plot(fit)
id <- c(rep("1", 3), rep("2", 4), rep("3", 2))
time <- c(400, 900, 1500, 300, 700, 1200, 1800, 600, 1400)
event <- rep(1, 9)
end_times <- c("1" = 2000, "2" = 2000, "3" = 2000)
fit <- mcf(id = id, time = time, event = event, end_time = end_times)
plot(fit, main = "Mean Cumulative Function", xlab = "Time (hours)", ylab = "MCF")
Interpreting Results and Repair Effectiveness
Reading the Beta Parameter
The estimated \(\hat{\beta}\) from any Power Law fit gives immediate operational insight:
| \(\hat{\beta}\) | Interpretation | Typical cause |
|---|---|---|
| \(< 1\) | Failure rate decreasing | Effective maintenance, infant mortality resolved, improving conditions |
| \(= 1\) | Constant failure rate | Random external shocks dominate, stable operating regime |
| \(> 1\) | Failure rate increasing | Wear-out, accumulating damage, aging components, deteriorating environment |
Cumulative vs Instantaneous MTBF
Two distinct MTBF metrics arise in repairable systems analysis:
- Cumulative MTBF: total operating time divided by total number of failures — a historical average.
- Instantaneous MTBF: the expected time to the next failure if the system continues from its current state.
\[\text{Instantaneous MTBF}(t) = \frac{1}{\hat{\lambda}\,\hat{\beta}\,t^{\hat{\beta}-1}}\]
For \(\hat{\beta} > 1\) the instantaneous MTBF falls below the cumulative average at late times. The system’s near-term reliability is worse than its history suggests.
Predicting Future Failures
The expected number of additional failures from current time \(T\) to a future time \(T + \Delta t\) is:
\[E[\Delta N] = \hat{\lambda}\,(T + \Delta t)^{\hat{\beta}} - \hat{\lambda}\,T^{\hat{\beta}}\]
Use the controls below to explore how the forecast changes with model parameters and horizon.
Exercise: Predict Future Failures
A system has been operating for 5,000 cumulative hours. A Power Law fit yielded λ = 0.0005 and β = 1.2. Calculate the expected number of additional failures in the next 1,000 hours (from hour 5,000 to hour 6,000).
lambda <- 0.0005
beta <- 1.2
T_current <- 5000
T_future <- 6000
# E[ΔN] = lambda * T_future^beta - lambda * T_current^beta
lambda <- 0.0005
beta <- 1.2
T_current <- 5000
T_future <- 6000
expected_failures <- lambda * T_future^beta - lambda * T_current^beta
cat("Expected additional failures:", round(expected_failures, 2), "\n")
Summary
Congratulations on completing the Repairable Systems Analysis tutorial! You have learned how to:
- Distinguish repairable from non-repairable systems and select the appropriate analytical framework.
- Apply the HPP and NHPP as models for recurrent failure data.
- Interpret the Power Law Process parameters \(\lambda\) and \(\beta\) to characterize system failure trends.
- Use
nhpp()to fit and plot the Crow-AMSAA model. - Use
nhpp()withbreaksto model systems with structural changes in failure behavior. - Use
exposure()to compute fleet-level operating time at risk and event rates. - Use
mcf()to non-parametrically estimate the expected cumulative failures per system. - Forecast future failure counts using the fitted Power Law model.
To Get Help
?nhpp
?exposure
?mcf
help(package = "ReliaGrowR")
Additional Resources
- ReliaGrowR package documentation: https://cran.r-project.org/package=ReliaGrowR
- ReliaSoft — Recurrent Event Data Analysis: https://www.reliawiki.org/index.php/Repairable_Systems_Analysis
- Crow (1975) technical report: https://apps.dtic.mil/sti/citations/ADA020296
References
Aden-Buie G, Schloerke B, Allaire J, Rossell Hayes A (2023). learnr: Interactive Tutorials for R. https://rstudio.github.io/learnr/
Ascher H, Feingold H (1984). Repairable Systems Reliability. Marcel Dekker, New York.
Crow, Larry H. (1975). Reliability Analysis for Complex, Repairable Systems. Army Material Systems Analysis Activity, 40. https://apps.dtic.mil/sti/citations/ADA020296
Govan P (2024). ReliaGrowR: Reliability Growth Analysis. R package. https://cran.r-project.org/package=ReliaGrowR
Guo H, Mettas A, Sarakakis G, Niu P (2010). Piecewise NHPP models with maximum likelihood estimation for repairable systems. 2010 Proceedings — Annual Reliability and Maintainability Symposium (RAMS), 1–7. https://doi.org/10.1109/RAMS.2010.5448029
Meeker W. Q., Escobar L. A. (1998). Statistical Methods for Reliability Data. Wiley-Interscience, New York.