---
title: "migrate"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{migrate}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
options(width = 999)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(migrate)
```

## Using {migrate}

This package is intended to serve as a set of tools to help convert credit risk data at two timepoints into traditional state transition matrices. At a higher level, {migrate} is intended to help an analyst understand how risk moved in their credit portfolio over a time interval.

## Background

One of the more difficult aspects of making a state migration matrix in R (or Python, for that matter) is the fact that the output doesn't satisfy the structure of a traditional data frame object. Rather, the output needs to be a *matrix*, which is a data structure that R does support. In the past, there has been difficulty converting a matrix to something more visual-friendly. More recently, however, tools like the [kableExtra](https://cran.r-project.org/package=kableExtra) and [gt](https://cran.r-project.org/package=gt) packages allow us to present visually appealing output that extends the structure of a data frame. Using the matrix-style output of {migrate}'s functions with a visual formatting package such as the two mentioned above will hopefully help analysts streamline the presentation of their credit portfolio's state migration matrices to an audience.

## Getting Started

If you haven't done so already, first install {migrate} with the instructions in the [README section](https://github.com/ketchbrookanalytics/migrate#Installation).

First, load the package using `library()`

```{r load, eval = FALSE}
library(migrate)
```

The package has a built-in mock dataset, which can be loaded into the environment like so:

```{r data, results = 'hide'}
data("mock_credit")

head(mock_credit[order(mock_credit$customer_id), ])   # sort by 'customer_id'
```

```{r data_tbl, echo = FALSE}
head(mock_credit[order(mock_credit$customer_id), ]) |>
  knitr::kable(row.names = FALSE)
```

Note that an important feature of the `mock_credit` dataset is that there are exactly two (2) unique values in the `date` column variable; if the `time` argument passed to `migrate()` has more than two (2) unique values, the function will throw an error.

```{r dates}
unique(mock_credit$date)
```

To summarize the migration within the data, use the `migrate()` function

```{r migrate}
migrated_df <- migrate(
  data = mock_credit,
  id = customer_id,
  time = date,
  state = risk_rating,
)
head(migrated_df)
```

To create the state transition matrix, use the `build_matrix()` function
```{r matrix}
build_matrix(migrated_df)
```

Or, to do it all in one shot, use the `|>`
```{r pipe}
mock_credit |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    metric = principal_balance,
    percent = FALSE,
    verbose = FALSE
  ) |>
  build_matrix(
    state_start = risk_rating_start,
    state_end = risk_rating_end,
    metric = principal_balance
  )
```

## Handle IDs with observations at a single timepoint

The following code creates a dataframe that features 500 customers with the following characteristics:

- 470 customers have a value at both timepoints
- 20 customers have a value only at the first timepoint
- 10 customers have a value only at the second timepoint

```{r}
mock_credit_with_missing <- mock_credit |>
  # Remove the value at the first timepoint for 10 customers
  dplyr::slice(-(1:10)) |>
  # Remove the value at the last timepoint for 20 customers
  dplyr::slice(-((dplyr::n() - 19):dplyr::n()))
```

Check that the new dataframe has information about 500 customers:

```{r}
# Number of unique customer_id values in mock_credit_with_missing
dplyr::n_distinct(mock_credit_with_missing$customer_id)
```

By default, `migrate()` drops observations that belong to IDs found at a single timepoint. `migrate()` informs such behavior through a warning:

```{r}
migrated_data_without_fill_state <- mock_credit_with_missing |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    percent = FALSE,
    verbose = FALSE
  )
```

Notice that only 470 customers have been migrated:

```{r}
migrated_data_without_fill_state |>
  dplyr::pull(count) |>
  sum()
```

You can use `migrate()`'s `fill_state` argument to ensure that no information is lost during the migration process. When a *filler state* value (e.g., a character string such as "No Rating" or "NR") is assigned to `fill_state`, IDs with a single timepoint are not removed but rather migrated from or to this *filler state*.

When `verbose = TRUE` a message will provide additional information about the IDs with missing timepoints:

```{r}
migrated_data_with_fill_state <- mock_credit_with_missing |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    fill_state = "No Rating",
    percent = FALSE,
    verbose = TRUE
  )
```

Check that 500 customers were migrated:

```{r}
migrated_data_with_fill_state |>
  dplyr::pull(count) |>
  sum()
```

So far we have been using `count` as the metric to easily determine the amount of customers that migrated in each scenario. The following code provides an example migration that leverages `principal_balance` as the metric:

```{r}
mock_credit_with_missing |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    metric = principal_balance,
    fill_state = "No Rating",
    percent = FALSE,
    verbose = FALSE
  ) |>
  build_matrix(
    state_start = risk_rating_start,
    state_end = risk_rating_end,
    metric = principal_balance
  )
```