---
title: "Storing and Analyzing Imputed Data with rbmiUtils"
date: "`r Sys.Date()`"
output:
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 2
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Storing and Analyzing Imputed Data with rbmiUtils}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE
)
```

# Introduction

This vignette demonstrates how to:

* Perform multiple imputation using the [`{rbmi}`](https://cran.r-project.org/package=rbmi) package
* Store and modify the imputed data using `{rbmiUtils}`
* Analyze the imputed data using:
  * A standard ANCOVA on a continuous endpoint (`CHG`)
  * A binary responder analysis on `CRIT1FLN` using [`{beeca}`](https://openpharma.github.io/beeca/)

This pattern enables reproducible workflows where imputation and analysis can be separated and revisited independently.

**Related vignettes:**

* [Data Preparation and Validation](data-preparation.html) - Validate data before imputation
* [Efficient Storage of Imputed Data](efficient-storage.html) - Reduce storage for large imputations

# Statistical Context

This approach applies **Rubin's Rules** for inference after multiple imputation (see the [rbmi quickstart vignette](https://CRAN.R-project.org/package=rbmi/vignettes/quickstart.html) for background on the draws/impute/analyse/pool pipeline):

> We fit a model to each imputed dataset, derive a response variable on the CHG score, extract marginal effects or other statistics of interest, and combine the results into a single inference using Rubin's combining rules.

---

# Step 1: Setup and Data Preparation

```{r libraries, message = FALSE, warning = FALSE}
library(dplyr)
library(tidyr)
library(readr)
library(purrr)
library(rbmi)
library(beeca)
library(rbmiUtils)
```

```{r seed}
set.seed(1974)
```

```{r load-data}
data("ADEFF")

ADEFF <- ADEFF %>%
  mutate(
    TRT = factor(TRT01P, levels = c("Placebo", "Drug A")),
    USUBJID = factor(USUBJID),
    AVISIT = factor(AVISIT)
  )
```


# Step 2: Define Imputation Model

We use [`rbmi::set_vars()`](https://CRAN.R-project.org/package=rbmi/vignettes/quickstart.html) to specify the key variable roles:

```{r define-vars}
vars <- set_vars(
  subjid = "USUBJID",
  visit = "AVISIT",
  group = "TRT",
  outcome = "CHG",
  covariates = c("BASE", "STRATA", "REGION")
)
```

```{r define-method}
method <- method_bayes(
  n_samples = 100,
  control = control_bayes(warmup = 200, thin = 2)
)
```

```{r impute}
dat <- ADEFF %>%
  select(USUBJID, STRATA, REGION, REGIONC, TRT, BASE, CHG, AVISIT)

draws_obj <- draws(data = dat, vars = vars, method = method)

impute_obj <- impute(draws_obj, references = c("Placebo" = "Placebo", "Drug A" = "Placebo"))

ADMI <- get_imputed_data(impute_obj)
```


# Step 3: Add Responder Variables

```{r derive-responder}
ADMI <- ADMI %>%
  mutate(
    CRIT1FLN = ifelse(CHG > 3, 1, 0),
    CRIT1FL = ifelse(CRIT1FLN == 1, "Y", "N"),
    CRIT = "CHG > 3"
  )
```


# Step 4: Continuous Endpoint Analysis (CHG)

```{r analyse-chg}
ana_obj_ancova <- analyse_mi_data(
  data = ADMI,
  vars = vars,
  method = method,
  fun = ancova
)
```

```{r pool-chg}
pool_obj_ancova <- pool(ana_obj_ancova)
print(pool_obj_ancova)
```

```{r tidy-chg}
tidy_pool_obj(pool_obj_ancova)
```


# Step 5: Responder Endpoint Analysis (CRIT1FLN)

## Define Analysis Function

We use [`beeca::get_marginal_effect()`](https://openpharma.github.io/beeca/reference/get_marginal_effect.html) for robust variance estimation of marginal treatment effects from the logistic model:

```{r gcomp-fun}
gcomp_responder <- function(data, ...) {
  model <- glm(CRIT1FLN ~ TRT + BASE + STRATA + REGION, data = data, family = binomial)

  marginal_fit <- get_marginal_effect(
    model,
    trt = "TRT",
    method = "Ge",
    type = "HC0",
    contrast = "diff",
    reference = "Placebo"
  )

  res <- marginal_fit$marginal_results
  list(
    trt = list(
      est = res[res$STAT == "diff", "STATVAL"][[1]],
      se = res[res$STAT == "diff_se", "STATVAL"][[1]],
      df = NA
    )
  )
}
```

## Define Variables and Run Analysis

```{r vars-binary}
vars_binary <- set_vars(
  subjid = "USUBJID",
  visit = "AVISIT",
  group = "TRT",
  outcome = "CRIT1FLN",
  covariates = c("BASE", "STRATA", "REGION")
)
```

```{r analyse-binary}
ana_obj_prop <- analyse_mi_data(
  data = ADMI,
  vars = vars_binary,
  method = method,
  fun = gcomp_responder
)

```

```{r pool-binary}
pool_obj_prop <- pool(ana_obj_prop)
print(pool_obj_prop)
```


# Final Notes

* The `ADMI` object can be saved for later reuse
* Analyses can be modularly applied using custom functions
* The tidy output from `tidy_pool_obj()` is helpful for reporting and review

## Efficient Storage

When working with many imputations, consider using `reduce_imputed_data()` to store only the imputed values:
```{r efficient-storage, eval = FALSE}
# Reduce for efficient storage
reduced <- reduce_imputed_data(ADMI, ADEFF, vars)

# Later, expand back for analysis
ADMI_restored <- expand_imputed_data(reduced, ADEFF, vars)
```

See the [Efficient Storage vignette](efficient-storage.html) for details.

## See Also

For a guided tutorial walking through the complete pipeline from raw data to regulatory tables, see `vignette('pipeline')`.