---
title: "Standardizing Medical Devices Surveillance Data"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Medical Devices Surveillance Data Standardization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
load(file="../data/maude.rda")
load(file="../data/sales.rda")
library(mds)
```

**Why?**

Medical device event data are messy.

Common challenges include:

- Performing ongoing surveillance on messy data
- Quickly answering simple questions such as: 
    - Are events trending up? 
    - How did my trends look 1 year ago? 2 years ago?
- Incompatibility of various sources of device-events
- Difficulty integrating exposures, a.k.a. denominator data
- Understanding all the possible combinations to analyze
- Application of disproportionality analysis (DPA)
- Documentation of analyses in a auditable, reproducible way

**How?**

The `mds` package provides a standardized framework to address these challenges:

- Standardize events involving medical devices
- Standardize exposures of the device (also known as opportunities for an event to occur, or event denominator)
- Enumerate possible analyses in a flexible way
- Generate times series of analyses for trending over time
- Set up analyses for easy application of disproportionality analysis (DPA)
- Save all files in lightweight `R` files for auditability, documentation, and reproducibility

**Purpose of This Vignette**

- Introduce the basics on using `mds`
- [4 steps](#foursteps) to go from raw device-event and exposure data to [plotting](#quickplot) event counts and rates over time
- How to use the core `mds` functions: [deviceevent()](#de), [exposure()](#ex), [define_analyses()](#da), [time_series()](#ts)
- Explain advanced usage options

**Note on Statistical Algorithms**

`mds` data and analysis standards allow for seamless application of various statistical trending algorithms via the `mdsstat` package (under development).




## Data: MAUDE and Simulated Sales

Our example dataset `maude` was queried from the [FDA MAUDE API](https://open.fda.gov) and contains 535 reported events on bone cement in 2017. Furthermore, a simulated exposure dataset `sales` was generated to provide denominator data for our bone cement events.

```{r}
library(mds)
dim(maude)
dim(sales)
```

    head(maude, 3)
    
```{r, echo=FALSE, results='asis'}
knitr::kable(head(maude, 3))
```

    head(sales, 3)
    
```{r, echo=FALSE, results='asis'}
knitr::kable(head(sales, 3))
```




## Raw Data to Trending in 4 Steps {#foursteps}

The general workflow to go from data to trending over time is as follows:

1. Use `deviceevent()` to standardize device-event data.
1. Use `exposure()` to standardize exposure data (optional).
1. Use `define_analyses()` to enumerate possible analysis combinations.
1. Use `time_series()` to generate counts (and/or rates) by time based on your defined analyses.

### Live Example

```{r}
# Step 1 - Device Events
de <- deviceevent(
  maude,
  time="date_received",
  device_hierarchy=c("device_name", "device_class"),
  event_hierarchy=c("event_type", "medical_specialty_description"),
  key="report_number",
  covariates="region",
  descriptors="_all_")

# Step 2 - Exposures (Optional step)
ex <- exposure(
  sales,
  time="sales_month",
  device_hierarchy="device_name",
  match_levels="region",
  count="sales_volume")

# Step 3 - Define Analyses
da <- define_analyses(
  de,
  device_level="device_name",
  exposure=ex,
  covariates="region")

# Step 4 - Time Series
ts <- time_series(
  da,
  deviceevents=de,
  exposure=ex)
```




## What to Do Next

You may:

- Save your data (`de`, `ex`), analyses (`da`), and time series (`ts`) for documentation
- Summarize your analyses using `summary()` and `define_analyses_dataframe()`
- `plot()` your time series ([plotting options](#po))
- Run trending statistics (and/or use the `mdsstat` package)

### Summarize Defined Analyses

```{r}
summary(da)
```

### Show All Analyses as a Data Frame

```{r}
dadf <- define_analyses_dataframe(da)
```

    head(dadf, 3)
    
```{r, echo=FALSE, results='asis'}
knitr::kable(head(dadf, 3))
```

### Plot Time Series of Counts and Rates {#quickplot}

```{r, fig.show='hold'}
plot(ts[[1]])
plot(ts[[4]], "rate", type='l')
```




## `deviceevent()` to Standardize Device-Event Data {#de}

**Basic Usage**

```{r}
de <- deviceevent(maude, "date_received", c("device_name", "device_class"), c("event_type", "medical_specialty_description"))
```

    head(de, 3)
    
```{r, echo=FALSE, results='asis'}
knitr::kable(head(de, 3))
```

**Advanced Usage**

```{r}
de <- deviceevent(
  maude,
  time="date_received",
  device_hierarchy=c("device_name", "device_class"),
  event_hierarchy=c("event_type", "medical_specialty_description"),
  key="report_number",
  covariates="region",
  descriptors="_all_")
```

    head(de, 3)

```{r, echo=FALSE, results='asis'}
knitr::kable(head(de, 3))
```

#### Required Arguments

`data_frame`
  : is the input device-event data frame. All remaining arguments refer to variables within this data frame.
  
`time`
  : is the name of the variable containing the time of the event. The input format of the time will be flexibly converted into `Date` format.
  
`device_hierarchy`
  : organizes all your device variables into a hierarchy. The hierarchical concept reflects how devices are often nested into progressively more general groups. Set the first variable as the lowest device level that you would like to trend at. `mds` remembers this hierarchy and allows trending at multiple levels as you specify.
  
`event_hierarchy`
  : organizes all your event variables into a hierarchy. Like devices, event variables should be categorical in nature. *Free text descriptions should not be listed here*, but rather in the `descriptors` argument. The hierarchical concept reflects how events are often nested into progressively more general groups. Set the first variable as the lowest event level that you would like to trend at. `mds` remembers this hierarchy and allows trending at multiple levels as you specify. *If your data does not have an event variable*, you will need to create a dummy variable.

#### Optional Arguments

`key`
  : is a unique identifier for each unique event in `data_frame`. If your data pipeline carries over a key variable, it is recommended to specify it here. The `key` allows downstream aggregated analysis to be able to "look up" individual constituent events.

`covariates`
  : are a special group of variables that may be analyzed within device. For instance, declaring `covariates="Region"` will allow analysis of regions within device. These variables should be categorical in nature.

`descriptors`
  : are additional variables that should be retained for the purpose of describing individual events in downstream analysis.

`implant_days`
  : contains the age in days of an implantable device at the time of the event.




## `exposure()` to Standardize Exposure Data {#ex}

Exposure data is meant to support device-event data. As such, the general expectation is that variable values match between exposure and device-event data. For example, 10 exposures for `ev3 Solitaire` in `France` will be matched **exactly** to `ev3 Solitaire` events in `France`, and not to events for `EV3 SOLITAIRE` in `FRANCE`.

**Basic Usage**

```{r}
ex <- exposure(sales, "sales_month", "device_name")
```

    head(ex, 3)

```{r, echo=FALSE, results='asis'}
knitr::kable(head(ex, 3))
```

**Advanced Usage**

```{r}
ex <- exposure(
  sales,
  time="sales_month",
  device_hierarchy="device_name",
  match_levels="region",
  count="sales_volume")
```

    head(ex, 3)

```{r, echo=FALSE, results='asis'}
knitr::kable(head(ex, 3))
```

#### Required Arguments

Note: Although not required, `count` will commonly be used as well.

`data_frame`
  : is the input exposure data frame. All remaining arguments refer to variables within this data frame.
  
`time`
  : is the name of the variable containing the time of the exposure. The input format of the time will be flexibly converted into `Date` format. If exposure will be used, it is **critical** to have sufficient time granularity. For example, if analysis will be done monthly, exposure data must be no less granular than monthly. `mds` does not make assumptions about filling in holes in time!
    
`device_hierarchy`
  : contains all exposure device variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data `device_hierarchy` parameter.

`event_hierarchy`
  : contains all exposure event variables to match to your device-event data. As such, the values within these variables must match exactly to the values within each respective variable in the device-event data `event_hierarchy` parameter. Exposures at an event level is not common.

#### Optional Arguments

`count`
  : is the most commonly specified optional parameter. It contains the number of exposures. If not specified, the number of rows will be used as a proxy for count.

`key`
  : is a unique identifier for each unique exposure in `data_frame`. If your data pipeline carries over a key variable, it is recommended to specify it here. The `key` allows downstream aggregated analysis to be able to "look up" individual constituent exposure records.

`match_levels`
  : are variables aside from time, device, and event that specify an exposure. A common "match level" is country, if your exposure data is specific by country.




## `define_analyses()` to Enumerate Analysis Combinations {#da}

After standardizing device-event data using `deviceevent()` and, optionally, exposure data using `exposure()`, the next step is to **discover what types of analyses are possible**. This is separated from actually doing the analysis (counting, calculations, statistics, etc.) because:

- You may not need to analyze everything
- You may want to customize specific analyses, such as:
    - Custom date ranges
    - Covariate analysis for certain device-events
    - Analyze by month for one device, by quarter for another device
    - Use exposures for certain devices but not others
- A saved list of defined analyses quickly and easily answers:
    - Which devices and events did I analyze? When?
    - Did I also look at covariates?
    - Over what time period did I analyze?
    - Did I analyze monthly, quarterly, yearly, etc?
    - Did I use exposures?

**Basic Usage**

```{r}
da <- define_analyses(de, "device_name")
```

Note that `define_analyses()` returns a list of individual analyses. Each individual analysis contains a set of instructions. You can view an analysis by submitting `da[[1]]`, `da[[2]]`, etc., but a less cumbersome overview is possible using `summary()` and `define_analyses_dataframe()`.

`r options(warn=-1)`

```{r}
summary(da)
```

`r options(warn=0)`

    head(define_analyses_dataframe(da), 3)

```{r, echo=FALSE, results='asis'}
knitr::kable(head(define_analyses_dataframe(da), 3))
```

**Advanced Usage**

```{r}
da <- define_analyses(
  de,
  device_level="device_name",
  exposure=ex,
  covariates="region")
```

```{r}
summary(da)
```

    head(define_analyses_dataframe(da), 3)

```{r, echo=FALSE, results='asis'}
knitr::kable(head(define_analyses_dataframe(da), 3))
```

#### Required Arguments

`deviceevents`
  : is a standardized device-event data frame. (`class()` should contain `"mds_de"`)
  
`device_level`
  : is the source device variable to analyze by. Access these variable names by submitting `attributes(de)$device_hierarchy`.
  
#### Optional Arguments

`event_level`
  : is the source event variable to analyze by. Access these variable names by submitting `attributes(de)$event_hierarchy`.
  
`exposure`
  : is a standardized exposure data frame. (`class()` should contain `"mde_e"`)
  
`date_level` and `date_level_n`
  : are arguments used together to specify the time interval to analyze by. Default of `"months"` and `1` analyzes by month. Other examples include `"months"` and `12` for yearly, or `"days"` and `7` for weekly.
  
`covariates`
  : are variables to analyze within device. For example, `c("region")` analyzes by each level of `region` within device.
  
`times_to_calc`
  : specifies how many time periods (counting back in time) to analyze for. Time period is defined using `date_level` and `date_level_n`.

### What About Analyses Across All Events, All Devices, etc?

**It is always assumed that analyses at aggregated levels are desired.** (such as analysis of all events for a given device, or analysis of all events across all devices)

Aggregated level analysis is easily recognized by the `"All"` and `"Data"` values in `device_level`, `event_level`, `covariate`, and `covariate_level`.

```{r, echo=FALSE, results='asis'}
knitr::kable(define_analyses_dataframe(da)[c(11:12, 32), ])
```

### How to Customize Analyses

There are several options:

- Select certain analyses from the list (e.g. `da[[c(1:5, 24:27)]])`)
- Rerun `define_analyses()` with different parameter settings
- Edit values within an individual analysis (e.g. `da[[1]]$date_range_exposure['start'] <- as.Date("2016-10-01")`)




## `time_series()` to Generate Counts, Rates, and More {#ts}

Once an analysis has been defined using `define_analyses()`, the analyses instructions can be executed using `time_series()`, returning by defined time periods:

- Counts at the device, event, and *(optionally)* covariate level of interest.
- Event IDs (the `key` parameter from `deviceevent()`) for lookup of individual event records.
- Exposure counts
- Exposure IDs (the `key` parameter from `exposure()`) for lookup of individual exposure records.
- Disproportionality analysis (DPA) related counts:
    - Counts of the device & non-event
    - Counts of the non-device & event
    - Counts of the non-device & non-event

**Basic Usage**

```{r}
ts <- time_series(da, de)
```

Note that `time_series()` returns, in a list, one time series data frame for every analysis. You can select a time series by submitting `ts[[1]]`, `ts[[2]]`, etc.

    head(ts[[1]], 3)

```{r, echo=FALSE, results='asis'}
knitr::kable(head(ts[[1]], 3))
```

**Advanced Usage**

```{r}
ts <- time_series(
  da,
  deviceevents=de,
  exposure=ex)
```

    head(ts[[1]], 3)

```{r, echo=FALSE, results='asis'}
knitr::kable(head(ts[[1]], 3))
```

#### Required Arguments

`analysis`
  : is a single defined analysis (`class()` should contain `"mds_da"`) or a list of defined analysis.
  
`deviceevents`
  : is a standardized device-event data frame (`class()` contains `"mds_de"`). It is typically the same data frame used to generate `analysis`, but can be another `"mds_de"` data frame, such as a cut of the data at a different time. Note if, say, an older dataset is being used, the `analysis` date ranges must correspond.
  
#### Optional Arguments

`exposure`
  : is a standardized exposure data frame  (`class()` contains `"mds_e"`). It is typically the same data frame used to generate `analysis`. Like `deviceevents`, another data frame may be used, but the `analysis` instructions must correspond.
  
`use_hierarchy`
  : is a logical value for whether device and event hierarchies should be used for the calculation of disproportionality analysis (DPA) counts. Submit `?time_series.mds_da` for more details.

### How to Modify Counts & Exposures

It is not uncommon to adjust event and exposure counts, such as with applications of rolling or moving averages. These adjustments should be applied **after** generating time series data frames from `time_series()`.




## `plot()`ing a Time Series {#po}

Plotting an individual time series generated by `time_series()` is simple. Simply call `plot()` on the time series object:

```{r}
plot(ts[[1]])
```

There are a few custom parameters, including:

`mode`
  : with common values of `"nA"` (representing the device-event of interest), `"exposure"`, and `"rate"` (simply `"nA"/"exposure"`). Less common are `"nB"`, `"nC"`, and `"nD"` representing the cell counts of the disproportionality analysis (DPA) contingency table.
  
`xlab`, `ylab`, `main`
  : representing default `plot()` behavior. By default, axes and title labels are inferred directly from the time series.

All other parameters are from `plot.default()`.
