---
title: "Average treatment effect (ATE) for Competing risks and binary outcomes"
author: Klaus Holst & Thomas Scheike
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    fig_caption: yes
    fig_width: 7.15
    fig_height: 5.5        
vignette: >
  %\VignetteIndexEntry{Average treatment effect (ATE) for Competing risks and binary outcomes}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(mets)
```

The `binregATE` function estimates the average treatment effect for binary outcomes with
IPCW adjustment at a specific time-point, and can thus be used for survival or competing risks data.
Computation is linear in data size and influence functions are computed and available.

Key features:

  - the censoring weights can be stratum-dependent
  - predictions can be computed with standard errors
  - computation time is linear in data size, including standard errors
  - cluster-corrected standard errors are available via the `clusters` argument


Average treatment effect 
=========================

First we simulate some data that mimics that of Kumar et al. (2012).
This is data from multiple myeloma patients treated with allogeneic stem cell
transplantation from the Center for International Blood and Marrow Transplant Research
(CIBMTR). The data consisted of patients transplanted from 1995 to
2005, comparing outcomes between transplant periods: 2001--2005 ($N=488$)
versus 1995--2000 ($N=375$). The two competing events were
relapse (cause 2) and treatment-related mortality (TRM, cause 1) defined as death without relapse.
Kumar et al. (2012) considered the following risk covariates:
transplant time period (`gp`, main interest of the study: 1 for transplanted in
2001--2005 versus 0 for transplanted in 1995--2000), donor type (`dnr`: 1 for unrelated or
other related donor ($N=280$) versus 0 for HLA-identical sibling ($N=584$)), prior
autologous transplant (`preauto`: 1 for Auto+Allo transplant ($N=399$) versus 0 for
allogeneic transplant alone ($N=465$)) and time to transplant (`ttt24`: 1 for more than 24 months ($N=289$) versus 0 for less than or
equal to 24 months ($N=575$)).


We generate similar data by assuming that the two cumulative incidence curves are logistic and that
censoring depends on covariates via a Cox model, all wrapped in the `kumarsim` function.
The simulation does not enforce the constraint $F_1+F_2 < 1$, but as we increase the
sample size we still recover the parameters of cause 2.

We start by computing the average treatment effects for the transplant periods (`gp`), then consider
the possible interaction between `gp` and donor type (`dnr`) and estimate all contrasts between the risk estimates for all combinations
of `gp` and `dnr`. The key components are:

  - an outcome model (fitted by IPCW): `Event(...) ~ .....`
  - a propensity score model (using logistic regression): `treat.model=...`
  - an IPCW weight estimated using a stratified Kaplan-Meier model: `cens.model=~strata(..)`


```{r}
library(mets) 
set.seed(100)
###
n <- 400
kumar <- kumarsim(n,depcens=1)
kumar$cause <- kumar$status
kumar$ttt24 <- kumar[,6]
dtable(kumar,~cause)
dfactor(kumar) <- gp.f~gp
kumar$id <- 1:n
kumar$idc <- sample(100,n,TRUE)
kumar$ids <- sample(n,n)
kumar$id2 <- sample(n,n)
kumar2 <- kumar[order(kumar$id2),]
kumar$int <- interaction(kumar$gp,kumar$dnr)
kumar2$int <- interaction(kumar2$gp,kumar2$dnr)
clust <- 0

b2 <- binregATE(Event(time,cause)~gp.f+dnr+preauto+ttt24,kumar,cause=2,
	treat.model=gp.f~dnr+preauto+ttt24,time=40,cens.model=~strata(gp,dnr))
summary(b2)

b5 <- binregATE(Event(time,cause)~int+preauto+ttt24,kumar,cause=2,
		treat.model=int~preauto+ttt24,cens.code=0,time=60)
summary(b5)
```

We note that the estimates found using the large censoring model are very different from those
using the simple Kaplan-Meier weights that are severely biased for these data. This is due to a strong 
censoring dependence. 

The average treatment is around $0.17 = E(Y(1) - Y(0))$ at time 60 for the 
transplant period, under the standard causal assumptions. 
The 1/0 treatment variable used for the causal computation is found as the right hand side (rhs) of the treat.model 
or as the first argument on the rhs of the response model. 

The binregATE default uses binreg with its default to fit the working model and is 
recommended, but the logitIPCW and logitIPCWATE can also be used and are GLM-type 
IPCW weighted models (see binreg help page/vignette).  We can compare the logitIPCWATE/logitIPCW
with that of wglm and its ate function in the riskRegression package, and they agree. 

```{r}
ib2 <- logitIPCWATE(Event(time,cause)~gp.f+dnr+preauto+ttt24,kumar,cause=2,
	treat.model=gp.f~dnr+preauto+ttt24,time=40,cens.model=~strata(gp,dnr))
summary(ib2)

ib5 <- logitIPCW(Event(time,cause)~gp.f+dnr+preauto+ttt24,kumar,cause=2,cens.code=0,
	 time=60,cens.model=~strata(gp,dnr))
summary(ib5)

ibs <- logitIPCW(Event(time,cause)~gp.f+dnr+preauto+ttt24,kumar,cause=2,cens.code=0,time=60)
summary(ibs)

check <- 0
if (check==1) {
require(riskRegression)
e.wglm <- wglm( regressor.event=~gp.f+dnr+preauto+ttt24, formula.censor = Surv(time,cause==0)~+1, times = 60, data = kumar, product.limit=TRUE,cause=2)
summary(e.wglm)$coef
estimate(ibs)

es.wglm <- wglm( regressor.event=~gp.f+dnr+preauto+ttt24, 
formula.censor = Surv(time,cause==0)~strata(gp,dnr), times = 60, 
data = kumar, product.limit=TRUE,cause=2)
summary(es.wglm)$coef
estimate(ib5)
}
```

The cluster argument should not be used for the logitIPCWATE, but works for binregATE.


Average treatment for Competing risks data 
==========================================

The binreg function does direct binomial regression for one time-point, $t$, 
fitting the model 
\begin{align*}
P(T \leq t, \epsilon=1 | X )  & = \mbox{expit}( X^T \beta),
\end{align*}
for possible right censored data. The estimation procedure 
is based on IPCW adjusted estimating equation (EE)
\begin{align*}
 U(\beta) = &  X \left( \Delta(t) I(T \leq t, \epsilon=1 )/G_c(T \wedge t) - \mbox{expit}( X^T beta) \right) = 0 
\end{align*}
where $G_c(t)=P(C > t)$, the censoring survival distribution, and with $\Delta(t) = I( C > T \wedge t)$ the indicator
of being uncensored at time $t$. 

The function logitIPCW instead considers the EE
\begin{align*}
 U(\beta) = &  X  \frac{\Delta(t)}{G_c(T \wedge t)} \left( I(T \leq t, \epsilon=1 ) - \mbox{expit}( X^T beta) \right) = 0.
\end{align*}
The two score equations are quite similar, and exactly the same when the censoring model is fully-nonparametric given $X$.

Additional functions logitATE, and binregATE computes the average treatment effect. We demonstrate their use below. 

The functions binregATE (recommended) and logitATE also works when there is no censoring and we thus have simple binary outcome. 

Variance is based on  sandwich formula with IPCW adjustment, 
and naive.var is variance 
under a known censoring model. The influence functions are stored in the 
output. Further, the standard errors can be
cluster corrected by specifying the relevant cluster for the working outcome model.

 - We estimate the average treatment effect of our binary response $I(T \leq t, \epsilon=1)$
   - Using a working logistic model for the response (possibly with a cluster specification)
   - Using a working logistic model for treatment given covariates
     - The binregATE can also handle a factor with more than two levels and then uses
        the mlogit multinomial regression function (of mets). 
   - Using a working model for censoring given covariates, this must be a stratified Kaplan-Meier.

If there are no censoring then the censoring weights are simply set to 1.

The average treatment effect is 
\begin{align*}
 E(Y(1) - Y(0)) 
\end{align*}
using counterfactual outcomes. 

We compute the simple G-estimator 
\begin{align*}
\sum  m_a(X_i) 
\end{align*}
to estimate the risk $E(Y(a))$. 

The DR-estimator instead uses the estimating equations that are double robust wrt 

   - A working logistic model for the response
   - A working logistic model for treatment given covariates

This is estimated using the estimator 
\begin{align*}
\sum \left[ \frac{A_i Y_i}{\pi_A(X_i)}-\frac{A_i - \pi_A(X_i)}{\pi_A(X_i)} m_1(X_i) \right]
   - \left[ \frac{(1-A_i) Y_i}{1-\pi_A(X_i)}+\frac{A_i - \pi_A(X_i)}{1-\pi_A(X_i)} m_0(X_i) \right]
\end{align*}
where

 - $A_i$ is treatment indicator
 - $\pi_A(X_i) = P(A_i=1|X_i)$ is treatment model 
 - $Y_i$ outcome, that in case of censoring is censoring adjusted $\tilde Y_i \Delta(t) /G_c(T_i- \wedge t)$
 - $\tilde Y_i = I(T_i \leq t, \epsilon_i=1)$ outcome before censoring.
 - $m_j(X_i)=P(Y_i=1| A_i=j,X_i)$ is outcome model, using binomial regression.

 The standard errors are then based on an iid decomposition using taylor-expansions 
 for the
 parameters of the treatment-model and the outcome-model, and the 
 censoring probability. 

 We need that the censoring model is correct, so it can be important to use a sufficiently 
 large censoring model as we also illustrate below. 

   - The censoring model can be specified by strata (used for `phreg`).

We also compute standard marginalization for average treatment effect  (called differenceG) 
\begin{align*}
\sum \left[  m_1(X_i) - m_0(X_i) \right]
\end{align*}
and again standard errors are based on the related influence functions and are also returned. 

For large data where there are more than 2 treatment groups the 
computations can be memory extensive when
there are many covariates due to the multinomial-regression model used for the 
propensity scores. Otherwise
the function (binregATE) will run for large data.


The ATE functions require that the treatment variable, given as the first variable on
the right-hand side of the outcome model, is a factor. The variable is also
identified from the left-hand side of the
treatment model (`treat.model`), which by default
assumes that treatment does not depend on any covariates.


Average treatment effect for binary or continuous responses
===========================================================

In the binary case a binary outcome is specified instead of the survival outcome, and as a
consequence no censoring adjustment is done.

 - the binary/numeric outcome must be a variable in the data frame

The following code runs the analysis (one can also use `binregATE` by coding `cause` without censoring values, i.e. setting `cens.code=2` and `time` large):

```{r}
kumar$cause2 <- 1*(kumar$cause==2)

b3 <- logitATE(cause2~gp.f+dnr+preauto+ttt24,kumar,treat.model=gp.f~dnr+preauto+ttt24)
summary(b3)

###library(targeted)
###b3a <- ate(cause2~gp.f|dnr+preauto+ttt24| dnr+preauto+ttt24,kumar,family=binomial)
###summary(b3a)

## calculate also relative risk
estimate(coef=b3$riskDR,vcov=b3$var.riskDR,f=function(p) p[1]/p[2])
```

Or with continuous response using normal estimating equations

```{r}
b3 <- normalATE(time~gp.f+dnr+preauto+ttt24,kumar,treat.model=gp.f~dnr+preauto+ttt24)
summary(b3)
```


SessionInfo
============


```{r}
sessionInfo()
```