| Type: | Package |
| Title: | Fast Kaplan-Meier, Log-Rank, and Hazard Ratio Estimation for Survival Analysis |
| Version: | 0.1.0 |
| Description: | Provides fast alternatives to standard survival analysis functions in the 'survival' package. The package implements a single-time-point Kaplan-Meier estimator (survfit_fast()), a log-rank test (survdiff_fast()), a closed-form hazard ratio estimator based on the Pike-Halley Estimator method (coxph_fast()), and a clinical trial data simulator (simdata_fast()). All functions are designed for repeated evaluation inside large simulation loops, such as adaptive sample-size re-estimation, probability-of-success calculations, and regional consistency evaluation in multi-regional trials. Core computations are implemented in 'C++' via 'Rcpp' for maximum performance. Methodological background is described in Collett (2014, ISBN:9780429196294). |
| License: | MIT + file LICENSE |
| URL: | https://github.com/gosukehommaEX/FastSurvival |
| BugReports: | https://github.com/gosukehommaEX/FastSurvival/issues |
| Encoding: | UTF-8 |
| Language: | en-US |
| Depends: | R (≥ 4.1.0) |
| Imports: | dqrng, Rcpp |
| LinkingTo: | Rcpp, dqrng |
| Suggests: | survival, microbenchmark, testthat (≥ 3.0.0), knitr, rmarkdown |
| Config/testthat/edition: | 3 |
| Config/roxygen2/version: | 8.0.0 |
| VignetteBuilder: | knitr |
| NeedsCompilation: | yes |
| Packaged: | 2026-05-18 23:44:47 UTC; i_lik |
| Author: | Gosuke Homma [aut, cre] |
| Maintainer: | Gosuke Homma <my.name.is.gosuke@gmail.com> |
| Repository: | CRAN |
| Date/Publication: | 2026-05-27 08:40:07 UTC |
FastSurvival: Fast Survival Analysis Functions for Simulation Studies
Description
FastSurvival provides fast alternatives to the standard survival analysis functions in the survival package. Every function is designed for repeated evaluation inside large simulation loops, such as adaptive sample-size re-estimation, probability-of-success calculations, and regional consistency evaluation in multi-regional clinical trials. Core computations are implemented in C++ via Rcpp for maximum performance inside simulation loops.
Details
The package exports four functions. The three analysis functions return
S3-class objects ("survfit_fast", "survdiff_fast",
"coxph_fast") with print() methods that format the results
similarly to the corresponding survival package output. Each object
is internally a numeric vector, so it can be used directly in arithmetic,
subsetting, and aggregation after stripping the class with
unclass.
survfit_fastKaplan-Meier survival probability, standard error, and confidence interval at a single specified time point. The C++ backend locates the evaluation cutoff via binary search and accumulates the Kaplan-Meier product and Greenwood sum in a single scan over event positions only, without constructing intermediate vectors, making it approximately 50 times faster than
survfit()plussummary()for repeated single-time-point evaluations.survdiff_fastLog-rank test statistic (one-sided Z-score or two-sided chi-square) for two-group survival data. The C++ backend uses a two-pointer merge scan over pooled sorted vectors, eliminating the
rank(),tabulate(), andrev(cumsum(rev(...)))overhead of the standard implementation, making it approximately 40 times faster thansurvdiff().coxph_fastClosed-form hazard ratio estimator via the Pike-Halley Estimator method, with Wald confidence interval. The Pike-Halley Estimator anchors at the Pike estimate and applies a single analytic Halley correction to the Cox partial likelihood score, achieving residual error of order O_p(n^{-3/2}) relative to the Cox maximum likelihood estimate. The C++ backend performs group splitting, at-risk counting, and per-distinct-event-time accumulation in a single pass, making it approximately 30 times faster than
coxph().simdata_fastClinical trial data simulator for one- and two-group time-to-event trials. Supports piecewise uniform accrual, simple and piecewise exponential survival and dropout times. C++ backends handle piecewise sampling and two-group interleaving, and random number generation uses dqrng, making it approximately 4 times faster than an equivalent pure-R implementation.
Author(s)
Maintainer: Gosuke Homma my.name.is.gosuke@gmail.com
Authors:
Gosuke Homma my.name.is.gosuke@gmail.com
References
Homma, G. (2025). One step from Pike to Cox: a closed-form hazard ratio estimator. Manuscript under review.
Collett, D. (2014). Modelling Survival Data in Medical Research (3rd ed.). Chapman and Hall/CRC.
See Also
Useful links:
Report bugs at https://github.com/gosukehommaEX/FastSurvival/issues
Fast Closed-Form Hazard Ratio Estimation via the Pike-Halley Estimator
Description
Estimates the hazard ratio for a two-group parallel trial using the
Pike-Halley Estimator, a pure closed-form approximation to the Cox partial
likelihood maximizer. The function returns the point estimate, its standard
error on the log scale, and a Wald-type confidence interval, using output
names consistent with summary(survival::coxph(...)). The C++ backend
accepts pooled sorted vectors directly, performing group splitting and all
accumulation in a single C++ pass without intermediate R-level vector copies.
Usage
coxph_fast(time, event, group, control, conf.int = 0.95, presorted = FALSE)
Arguments
time |
A numeric vector of follow-up times for all subjects (pooled over both groups). |
event |
An integer or numeric vector of event indicators
(1 = event, 0 = censored), aligned with |
group |
A vector of group labels aligned with |
control |
A scalar value indicating which level of |
conf.int |
A single numeric value in (0, 1) specifying the confidence level for the Wald interval. Defaults to 0.95. |
presorted |
A logical value. If |
Details
Let t_k (k = 1, ..., K) denote the distinct observed event times in the pooled sample. At each t_k, let n_T_k and n_C_k be the numbers at risk in the treatment and control groups just before t_k, and let O_T_k and O_C_k be the numbers of events in each group, with n_k = n_T_k + n_C_k and O_k = O_T_k + O_C_k. Define E_T = sum n_T_k O_k / n_k and E_C = sum n_C_k O_k / n_k as the log-rank expected event totals.
The Pike-Halley Estimator is obtained in three steps. First, the Pike anchor is computed as theta_0 = (O_T E_C) / (O_C E_T). Second, the score U_0, the observed information I_0, and the third-order curvature term J_0 of the Breslow partial likelihood are evaluated at eta_0 = log(theta_0):
p_k = n_T_k theta_0 / (n_C_k + n_T_k theta_0) U_0 = sum (O_T_k - O_k p_k) I_0 = sum O_k p_k (1 - p_k) J_0 = sum O_k p_k (1 - p_k) (1 - 2 p_k)
Third, the closed-form Halley correction is applied:
delta_hat = U_0 / I_0 - J_0 U_0^2 / (2 I_0^3) theta_hat = theta_0 exp(delta_hat)
The residual error satisfies |theta_hat - theta_Cox| = O_p(n^{-3/2}), three orders of magnitude faster than the O_p(n^{-1/2}) rate of Peto and Pike, and the per-call cost is approximately thirty times lower than that of the iterative Cox solver (Homma, 2025).
The Wald standard error on the log scale is SE = 1 / sqrt(I_0), where I_0
is the observed information evaluated at the Pike anchor. This is the same
quantity used in the Wald confidence interval reported by
summary(coxph(...)), which is based on the observed information at
the maximum likelihood estimate. Because the Pike anchor lies within
O_p(n^{-1/2}) of the Cox maximum likelihood estimate, the difference
between I_0 and the information at the maximum likelihood estimate is
negligible for the purpose of interval construction.
The C++ core (pihe_core) accepts the pooled sorted data together
with an integer group indicator and performs group splitting, at-risk
counting, and per-distinct-event-time accumulation in a single left-to-right
scan. This eliminates the rev(cumsum(rev(...))), tapply(),
which(), diff(), and group-split vector copies present in the
pure-R version.
The returned object has class "coxph_fast" and is a named numeric
vector of length 5. A print() method formats the result similarly
to summary(coxph(...)).
Value
An object of class "coxph_fast", which is a named numeric
vector of length 5 with elements matching the column names of
summary(coxph(...))$coefficients and
summary(coxph(...))$conf.int:
coefLog hazard ratio log(theta_hat).
exp(coef)Hazard ratio theta_hat (point estimate).
se(coef)Standard error of
coefon the log scale, equal to 1 / sqrt(I_0).lower .95Lower bound of the Wald confidence interval for the hazard ratio. The label reflects
conf.int(e.g.,"lower .90"whenconf.int = 0.90).upper .95Upper bound of the Wald confidence interval.
Returns a vector of NA_real_ values (still with class
"coxph_fast") when the estimate cannot be computed (e.g., no
events, all events in one group, or I_0 = 0).
References
Homma, G. (2025). One step from Pike to Cox: a closed-form hazard ratio estimator. Manuscript under review.
Berry, G., Kitchin, R. M., & Mock, P. A. (1991). A comparison of two simple hazard ratio estimators based on the logrank test. Statistics in Medicine, 10(5), 749-755.
See Also
coxph for the standard iterative Cox estimator.
print.coxph_fast for the print method.
Examples
library(survival)
# Compare coxph_fast with coxph on the ovarian dataset.
# coxph() treats rx as numeric with rx=1 as the reference (control),
# so set control = 1 for a consistent comparison.
fit_fast <- coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1)
fit_fast
fit_cox <- summary(coxph(Surv(futime, fustat) ~ rx, data = ovarian))
cat("coxph_fast HR :", fit_fast["exp(coef)"], "\n")
cat("coxph HR :", fit_cox$coefficients[, "exp(coef)"], "\n")
# presorted = TRUE: sort once outside, reuse inside a loop
ord <- order(ovarian$futime)
coxph_fast(ovarian$futime[ord], ovarian$fustat[ord], ovarian$rx[ord],
control = 1, presorted = TRUE)
library(microbenchmark)
microbenchmark(
coxph_fast = coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, 2),
coxph = coxph(Surv(futime, fustat) ~ rx, data = ovarian),
times = 1000
)
Core Kaplan-Meier computation (C++ backend)
Description
Internal C++ function that computes the Kaplan-Meier survival estimate and
the Greenwood variance sum at a specified time point. Binary search locates
the cutoff index; a single left-to-right scan accumulates the KM product
and Greenwood sum over event positions only. Accepts the event vector as
a numeric (double) vector to avoid an as.integer() copy in R.
Uses std::log1p(-1/n_risk) for the KM log-product, which is more
accurate than std::log(1 - 1/n_risk) near n_risk = 1 and avoids one
subtraction per event. Not intended to be called directly by users; use
survfit_fast() instead.
Usage
km_core(t_sorted, e_sorted, t_eval)
Arguments
t_sorted |
A numeric vector of follow-up times sorted in ascending order. |
e_sorted |
A numeric vector of event indicators (1 = event,
0 = censored), aligned with |
t_eval |
A single numeric value specifying the evaluation time point. |
Value
A numeric vector of length 2: c(surv, gw_sum), where
surv is the KM estimate and gw_sum is the Greenwood
variance sum used to compute SE[S(t)] = surv * sqrt(gw_sum).
Returns c(1.0, 0.0) when no events occur up to t_eval.
Core log-rank computation on pooled sorted vectors (C++ backend)
Description
Internal C++ function that computes O1, E1, and V1 for the log-rank test
directly from a pooled sorted dataset and a group indicator. Eliminates
the R-level group-splitting copies (time[is1], time[!is1],
event[is1], event[!is1]) of the previous design by walking
the pooled vector once and updating per-group at-risk counters in C++.
Tied event times are processed atomically. Not intended to be called
directly by users; use survdiff_fast() instead.
Usage
logrank_core(time_sorted, event_sorted, j_sorted)
Arguments
time_sorted |
A numeric vector of pooled follow-up times sorted in ascending order. |
event_sorted |
An integer vector of event indicators (1 = event,
0 = censored), aligned with |
j_sorted |
An integer vector of group indicators (1 = treatment,
0 = control), aligned with |
Value
A numeric vector of length 3: c(O1, E1, V1).
Core PiHE hazard ratio computation (C++ backend)
Description
Internal C++ function that computes the Pike-Halley Estimator (PiHE) for
the hazard ratio. Accepts pooled sorted vectors plus an integer group
indicator, performs group splitting and the two-pointer merge scan
entirely in C++, and returns the quantities needed for the Halley
correction and Wald interval. Not intended to be called directly by
users; use coxph_fast() instead.
Implementation notes: Pass 1 accumulates pooled scalars
(O_T, O_C, E_T, E_C) and stores per-distinct-event-time summaries in a
single struct array (better cache locality than four parallel vectors)
with a single reserve(n) call (no event-count prepass over the
input). Pass 2 walks the saved summaries once at the Pike anchor
theta_0 to compute U_0, I_0, and J_0.
Usage
pihe_core(time_sorted, event_sorted, j_sorted)
Arguments
time_sorted |
A numeric vector of pooled follow-up times sorted in ascending order. |
event_sorted |
An integer vector of event indicators (1 = event,
0 = censored), aligned with |
j_sorted |
An integer vector of group indicators (1 = treatment,
0 = control), aligned with |
Value
A numeric vector of length 4: c(theta_0, U_0, I_0, J_0).
Returns a length-4 vector of NA_real_ when the estimate cannot
be computed.
Print Method for coxph_fast Objects
Description
Formats and prints a coxph_fast object similarly to
summary(survival::coxph(...)), showing the point estimate of the
log hazard ratio, the hazard ratio, the standard error on the log scale,
the Wald z-statistic, the corresponding two-sided p-value, and the Wald
confidence interval for the hazard ratio.
Usage
## S3 method for class 'coxph_fast'
print(x, digits = max(1L, getOption("digits") - 3L), ...)
Arguments
x |
An object of class |
digits |
Number of significant digits to display. Defaults to the
global option |
... |
Additional arguments passed to |
Value
Invisibly returns x.
See Also
Examples
library(survival)
fit <- coxph_fast(ovarian$futime, ovarian$fustat, ovarian$rx, control = 1)
print(fit)
Print Method for survdiff_fast Objects
Description
Formats and prints a survdiff_fast object similarly to
print(survival::survdiff(...)), showing the observed and expected
event counts for the control and treatment groups, the per-group
contributions (O-E)^2 / E and (O-E)^2 / V, the test
statistic, and the corresponding p-value.
Usage
## S3 method for class 'survdiff_fast'
print(x, digits = max(1L, getOption("digits") - 3L), ...)
Arguments
x |
An object of class |
digits |
Number of significant digits to display. Defaults to the
global option |
... |
Additional arguments (currently unused). |
Value
Invisibly returns x.
See Also
Examples
library(survival)
fit <- survdiff_fast(ovarian$futime, ovarian$fustat, ovarian$rx,
control = 1, side = 2)
print(fit)
Print Method for survfit_fast Objects
Description
Formats and prints a survfit_fast object similarly to
print(summary(survival::survfit(...), times = t_eval)), showing the
Kaplan-Meier survival estimate, the Greenwood standard error on the
survival scale, and the confidence interval at the requested evaluation
time.
Usage
## S3 method for class 'survfit_fast'
print(x, digits = max(1L, getOption("digits") - 3L), ...)
Arguments
x |
An object of class |
digits |
Number of significant digits to display. Defaults to the
global option |
... |
Additional arguments (currently unused). |
Value
Invisibly returns x.
See Also
Examples
set.seed(42)
t_raw <- rexp(100, rate = 1 / 10)
e_raw <- rbinom(100, 1, 0.7)
ord <- order(t_raw)
fit <- survfit_fast(t_raw[ord], e_raw[ord], t_eval = 10)
print(fit)
Fast Survival Data Simulation for Clinical Trials
Description
Simulates individual patient data for one- or two-group time-to-event
trials. Patient accrual times are drawn from a piecewise uniform
distribution. Survival and dropout times are drawn from either a simple
exponential or a piecewise exponential distribution, depending on whether
a scalar or vector hazard is supplied. Random number generation uses
dqrng for speed. C++ backends handle piecewise sampling and
two-group interleaving to minimize R-level overhead.
Usage
simdata_fast(
nsim = 1000,
n,
alloc = c(1, 1),
a.time,
a.rate,
e.hazard = NULL,
e.median = NULL,
e.time = NULL,
d.hazard = NULL,
d.median = NULL,
d.time = NULL,
seed = NULL
)
Arguments
nsim |
A positive integer specifying the number of simulation iterations. Default is 1000. |
n |
A positive integer (one group) or a numeric vector of length 2
(two groups: |
alloc |
A numeric vector of length 2 specifying the allocation ratio
|
a.time |
A numeric vector of accrual interval boundaries
|
a.rate |
A numeric vector of accrual rates (patients per time unit)
for each interval. Length must equal |
e.hazard |
Hazard rate(s) for the event time. For a simple exponential,
supply a scalar (one group) or a list of two scalars (two groups). For a
piecewise exponential, supply a numeric vector (one group) or a list of
two numeric vectors (two groups); |
e.median |
Median survival time(s) for the event time. Same structure
as |
e.time |
Interval boundaries for a piecewise exponential event time,
of the form |
d.hazard |
Hazard rate(s) for the dropout time. Same structure as
|
d.median |
Median dropout time(s). Same structure as |
d.time |
Interval boundaries for a piecewise exponential dropout time.
Same structure as |
seed |
A single integer for reproducibility. Passed to
|
Details
For each patient, three times are generated independently:
- Accrual time
Drawn from a piecewise uniform distribution defined by
a.timeanda.rate. The probability of falling in interval j is proportional toa.rate[j] * (a.time[j+1] - a.time[j]).- Survival time
Drawn from a simple exponential distribution when
e.hazardis a scalar (ore.medianis a scalar), or from a piecewise exponential distribution whene.hazardis a vector (withe.timerequired). The inverse-CDF method is used for the piecewise case.- Dropout time
Drawn from the same family as survival time, governed by
d.hazard/d.medianandd.time. Set toInfwhend.hazardandd.medianare bothNULL(no dropout).
The observed time-to-event is tte = pmin(surv_time, dropout_time).
The event indicator is 1 when surv_time <= dropout_time and
0 otherwise. The calendar time is
calendar_time = accrual_time + tte.
Exactly one of e.hazard and e.median must be supplied.
Exactly one of d.hazard and d.median must be supplied when
dropout is modeled; both must be NULL to suppress dropout entirely.
For a two-group trial, group-specific parameters are passed as a list of length 2, where element 1 corresponds to the control group and element 2 to the treatment group. For a one-group trial, plain vectors or scalars are passed directly.
Value
A data.frame with nsim * sum(n) rows and the
following columns:
simSimulation ID (integer, 1 to
nsim).groupGroup label (integer: 1 for control, 2 for treatment). Always 1 for one-group simulations.
accrual_timePatient accrual time from study start.
surv_timeSurvival time from patient entry.
dropout_timeDropout time from patient entry (
Infwhen dropout is not modeled).tteObserved time-to-event:
pmin(surv_time, dropout_time).eventEvent indicator: 1 if
surv_time <= dropout_time, 0 otherwise.calendar_timeCalendar time from study start:
accrual_time + tte.
Examples
# One-group simulation, simple exponential, no dropout
df1 <- simdata_fast(
nsim = 100,
n = 50,
a.time = c(0, 12),
a.rate = 50 / 12,
e.median = 18,
seed = 1
)
head(df1)
# Two-group simulation, simple exponential, with dropout
df2 <- simdata_fast(
nsim = 100,
n = c(100, 100),
a.time = c(0, 6, 12),
a.rate = c(8, 12),
e.median = list(18, 24),
d.hazard = list(0.01, 0.01),
seed = 2
)
head(df2)
# Two-group simulation, piecewise exponential (delayed treatment effect)
df3 <- simdata_fast(
nsim = 100,
n = c(100, 100),
a.time = c(0, 12),
a.rate = 200 / 12,
e.hazard = list(c(0.08, 0.08), c(0.08, 0.04)),
e.time = c(0, 6, Inf),
seed = 3
)
head(df3)
# Two-group via total n + allocation ratio
df4 <- simdata_fast(
nsim = 100,
n = 200,
alloc = c(1, 1),
a.time = c(0, 12),
a.rate = 200 / 12,
e.hazard = list(0.08, 0.05),
seed = 4
)
head(df4)
Fast Log-Rank Test for Two-Group Survival Data
Description
Computes the log-rank test statistic for comparing survival curves between
two groups. Returns either a one-sided Z-score or a two-sided chi-square
statistic. The C++ backend uses a two-pointer merge scan over pooled sorted
vectors, eliminating the rank() call that dominates the pure-R
implementation.
Usage
survdiff_fast(time, event, group, control, side, presorted = FALSE)
Arguments
time |
A numeric vector of follow-up times for all subjects. |
event |
An integer or numeric vector of event indicators
(1 = event, 0 = censored), aligned with |
group |
A vector of group labels aligned with |
control |
A scalar value indicating which level of |
side |
An integer, either 1 or 2. If |
presorted |
A logical value. If |
Details
The log-rank statistic is computed as:
Z = (O_1 - E_1) / sqrt(V_1)
where O_1 is the observed number of events in the treatment group, E_1 is the expected number under the null hypothesis of equal survival, and V_1 is the hypergeometric variance. Tied event times are handled correctly: all subjects sharing the same event time form a tied block, and the block is processed atomically in the two-pointer merge.
When presorted = TRUE, the input vectors are assumed to be sorted in
ascending order of time and the internal order() call is
skipped. When presorted = FALSE (default), sorting is handled
internally. In simulation loops where the data are generated in sorted order,
setting presorted = TRUE avoids one O(n log n) pass.
The C++ core (logrank_core) walks the pooled sorted data with a
single two-pointer scan, maintaining running at-risk counts per group. No
rank vector is constructed, so the dominant O(n log n) cost of rank()
in the pure-R version is removed.
The returned object has class "survdiff_fast" and is a length-one
numeric value (Z-score or chi-square) with the underlying counts O_0, E_0,
O_1, E_1, V_1, the requested side, and the total sample size stored
as attributes. A print() method formats the result similarly to
print(survival::survdiff(...)), displaying observed and expected
event counts for both the control and treatment groups.
Value
An object of class "survdiff_fast", which is a length-one
numeric value with attributes O0, E0, O1, E1,
V1, side, and n. The numeric value is the Z-score
when side = 1, or the chi-square statistic when side = 2.
Returns NA_real_ (still with class "survdiff_fast") when
V1 = 0 (e.g., all events in one group).
References
Mantel, N. (1966). Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemotherapy Reports, 50(3), 163-170.
Peto, R., & Peto, J. (1972). Asymptotically efficient rank invariant test procedures. Journal of the Royal Statistical Society. Series A (General), 135(2), 185-198.
See Also
survdiff for the standard implementation.
print.survdiff_fast for the print method.
Examples
library(survival)
# Two-sided test: compare with survdiff
fit <- survdiff_fast(ovarian$futime, ovarian$fustat, ovarian$rx, 2, side = 2)
fit
chisq_ref <- survdiff(Surv(futime, fustat) ~ rx, data = ovarian)$chisq
cat("survdiff_fast chi-square:", as.numeric(fit), "\n")
cat("survdiff chi-square:", chisq_ref, "\n")
# One-sided test (Z-score)
survdiff_fast(ovarian$futime, ovarian$fustat, ovarian$rx, 2, side = 1)
# presorted = TRUE: sort once outside, reuse inside a loop
ord <- order(ovarian$futime)
survdiff_fast(ovarian$futime[ord], ovarian$fustat[ord], ovarian$rx[ord],
control = 2, side = 2, presorted = TRUE)
library(microbenchmark)
microbenchmark(
survdiff_fast = survdiff_fast(ovarian$futime, ovarian$fustat,
ovarian$rx, 2, side = 2),
survdiff = survdiff(Surv(futime, fustat) ~ rx, data = ovarian),
times = 1000
)
Fast Kaplan-Meier Survival Probability at a Specified Time Point
Description
Computes the Kaplan-Meier survival probability at a specified time point, together with a standard error and confidence interval based on Greenwood's variance formula. The C++ backend performs binary search for the evaluation cutoff and accumulates the Kaplan-Meier product and Greenwood sum in a single scan over event positions only, without constructing intermediate vectors.
Usage
survfit_fast(
t_sorted,
e_sorted,
t_eval,
conf.int = 0.95,
conf.type = "log",
presorted = TRUE
)
Arguments
t_sorted |
A numeric vector of event or censoring times. Must be
sorted in ascending order when |
e_sorted |
An integer or numeric vector of event indicators
(1 = event, 0 = censored), aligned with |
t_eval |
A single numeric value specifying the time point at which the survival probability is evaluated. |
conf.int |
A single numeric value in (0, 1) specifying the confidence level. Defaults to 0.95. |
conf.type |
A character string specifying the confidence interval type.
Must be one of |
presorted |
A logical value. If |
Details
The Kaplan-Meier estimate at time t_eval is defined as the
product-limit estimator evaluated at the largest observed event time less
than or equal to t_eval. If t_eval is smaller than the first
observed event time, S(t) = 1 and the standard error is zero.
The standard error is estimated by Greenwood's formula:
SE[S(t)] = S(t) * sqrt(sum_{t_i <= t, d_i > 0} d_i / (n_i * (n_i - d_i)))
where d_i is the number of events and n_i is the number at risk at time t_i.
The output field std.err follows the convention of
survfit, which reports SE[S(t)] / S(t) (i.e., the
standard error on the log scale) when conf.type != "plain", and
SE[S(t)] when conf.type = "plain". This function always returns
SE[S(t)] (the standard error on the survival scale).
When S(t_eval) = 0 (all subjects have experienced the event by
t_eval), the standard error is zero and the confidence interval
collapses to [0, 0], consistent with survfit.
When presorted = TRUE (default), t_sorted and e_sorted
are assumed to be sorted in ascending order of time. When
presorted = FALSE, the vectors are sorted internally before
computation.
Three confidence interval types are supported via conf.type:
-
"plain": Linear interval on the survival scale, S(t) +/- z * SE. The bounds are clipped to [0, 1]. -
"log": Interval on the log scale (default insurvfit), S(t) * exp(+/- z * SE / S(t)). -
"log-log": Interval on the complementary log-log scale, S(t)^exp(+/- z * SE / (S(t) * log(S(t)))).
The returned object has class "survfit_fast" and is a named numeric
vector of length 4 with the evaluation time t_eval, the confidence
level conf.int, and the confidence interval type conf.type
stored as attributes. A print() method formats the result similarly
to print(summary(survival::survfit(...))).
Value
An object of class "survfit_fast", which is a named numeric
vector of length 4 with elements surv, std.err,
lower, and upper, representing the Kaplan-Meier survival
estimate, the Greenwood standard error SE[S(t)], and the lower and upper
confidence limits at t_eval. The evaluation time, confidence
level, and confidence interval type are stored as attributes
t_eval, conf.int, and conf.type.
Returns a vector of NA_real_ values (still with class
"survfit_fast") when n is zero.
References
Kaplan, E. L., & Meier, P. (1958). Nonparametric Estimation from Incomplete Observations. Journal of the American Statistical Association, 53(282), 457–481.
See Also
survfit for the standard Kaplan-Meier estimator.
print.survfit_fast for the print method.
Examples
set.seed(42)
t_raw <- rexp(100, rate = 1 / 10)
e_raw <- rbinom(100, 1, 0.7)
# presorted = TRUE (default): sort once outside, reuse inside a loop
ord <- order(t_raw)
t_s <- t_raw[ord]
e_s <- e_raw[ord]
survfit_fast(t_s, e_s, t_eval = 10, conf.type = "plain")
survfit_fast(t_s, e_s, t_eval = 10, conf.type = "log")
survfit_fast(t_s, e_s, t_eval = 10, conf.type = "log-log")
# presorted = FALSE: sort internally, convenient for one-off calls
survfit_fast(t_raw, e_raw, t_eval = 10, presorted = FALSE)
# Validation against survival::survfit
library(survival)
fit <- survfit(Surv(t_raw, e_raw) ~ 1, conf.type = "plain")
summary(fit, times = 10)