---
title: "migrate"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{migrate}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, include = FALSE}
options(width = 999)
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
library(migrate)
```
## Using {migrate}
This package is intended to serve as a set of tools to help convert credit risk data at two timepoints into traditional state transition matrices. At a higher level, {migrate} is intended to help an analyst understand how risk moved in their credit portfolio over a time interval.
## Background
One of the more difficult aspects of making a state migration matrix in R (or Python, for that matter) is the fact that the output doesn't satisfy the structure of a traditional data frame object. Rather, the output needs to be a *matrix*, which is a data structure that R does support. In the past, there has been difficulty converting a matrix to something more visual-friendly. More recently, however, tools like the [kableExtra](https://cran.r-project.org/package=kableExtra) and [gt](https://cran.r-project.org/package=gt) packages allow us to present visually appealing output that extends the structure of a data frame. Using the matrix-style output of {migrate}'s functions with a visual formatting package such as the two mentioned above will hopefully help analysts streamline the presentation of their credit portfolio's state migration matrices to an audience.
## Getting Started
If you haven't done so already, first install {migrate} with the instructions in the [README section](https://github.com/ketchbrookanalytics/migrate#Installation).
First, load the package using `library()`
```{r load, eval = FALSE}
library(migrate)
```
The package has a built-in mock dataset, which can be loaded into the environment like so:
```{r data, results = 'hide'}
data("mock_credit")
head(mock_credit[order(mock_credit$customer_id), ]) # sort by 'customer_id'
```
```{r data_tbl, echo = FALSE}
head(mock_credit[order(mock_credit$customer_id), ]) |>
knitr::kable(row.names = FALSE)
```
Note that an important feature of the `mock_credit` dataset is that there are exactly two (2) unique values in the `date` column variable; if the `time` argument passed to `migrate()` has more than two (2) unique values, the function will throw an error.
```{r dates}
unique(mock_credit$date)
```
To summarize the migration within the data, use the `migrate()` function
```{r migrate}
migrated_df <- migrate(
data = mock_credit,
id = customer_id,
time = date,
state = risk_rating,
)
head(migrated_df)
```
To create the state transition matrix, use the `build_matrix()` function
```{r matrix}
build_matrix(migrated_df)
```
Or, to do it all in one shot, use the `|>`
```{r pipe}
mock_credit |>
migrate(
id = customer_id,
time = date,
state = risk_rating,
metric = principal_balance,
percent = FALSE,
verbose = FALSE
) |>
build_matrix(
state_start = risk_rating_start,
state_end = risk_rating_end,
metric = principal_balance
)
```
## Handle IDs with observations at a single timepoint
The following code creates a dataframe that features 500 customers with the following characteristics:
- 470 customers have a value at both timepoints
- 20 customers have a value only at the first timepoint
- 10 customers have a value only at the second timepoint
```{r}
mock_credit_with_missing <- mock_credit |>
# Remove the value at the first timepoint for 10 customers
dplyr::slice(-(1:10)) |>
# Remove the value at the last timepoint for 20 customers
dplyr::slice(-((dplyr::n() - 19):dplyr::n()))
```
Check that the new dataframe has information about 500 customers:
```{r}
# Number of unique customer_id values in mock_credit_with_missing
dplyr::n_distinct(mock_credit_with_missing$customer_id)
```
By default, `migrate()` drops observations that belong to IDs found at a single timepoint. `migrate()` informs such behavior through a warning:
```{r}
migrated_data_without_fill_state <- mock_credit_with_missing |>
migrate(
id = customer_id,
time = date,
state = risk_rating,
percent = FALSE,
verbose = FALSE
)
```
Notice that only 470 customers have been migrated:
```{r}
migrated_data_without_fill_state |>
dplyr::pull(count) |>
sum()
```
You can use `migrate()`'s `fill_state` argument to ensure that no information is lost during the migration process. When a *filler state* value (e.g., a character string such as "No Rating" or "NR") is assigned to `fill_state`, IDs with a single timepoint are not removed but rather migrated from or to this *filler state*.
When `verbose = TRUE` a message will provide additional information about the IDs with missing timepoints:
```{r}
migrated_data_with_fill_state <- mock_credit_with_missing |>
migrate(
id = customer_id,
time = date,
state = risk_rating,
fill_state = "No Rating",
percent = FALSE,
verbose = TRUE
)
```
Check that 500 customers were migrated:
```{r}
migrated_data_with_fill_state |>
dplyr::pull(count) |>
sum()
```
So far we have been using `count` as the metric to easily determine the amount of customers that migrated in each scenario. The following code provides an example migration that leverages `principal_balance` as the metric:
```{r}
mock_credit_with_missing |>
migrate(
id = customer_id,
time = date,
state = risk_rating,
metric = principal_balance,
fill_state = "No Rating",
percent = FALSE,
verbose = FALSE
) |>
build_matrix(
state_start = risk_rating_start,
state_end = risk_rating_end,
metric = principal_balance
)
```