--- title: "migrate" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{migrate} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} options(width = 999) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(migrate) ``` ## Using {migrate} This package is intended to serve as a set of tools to help convert credit risk data at two timepoints into traditional state transition matrices. At a higher level, {migrate} is intended to help an analyst understand how risk moved in their credit portfolio over a time interval. ## Background One of the more difficult aspects of making a state migration matrix in R (or Python, for that matter) is the fact that the output doesn't satisfy the structure of a traditional data frame object. Rather, the output needs to be a *matrix*, which is a data structure that R does support. In the past, there has been difficulty converting a matrix to something more visual-friendly. More recently, however, tools like the [kableExtra](https://cran.r-project.org/package=kableExtra) and [gt](https://cran.r-project.org/package=gt) packages allow us to present visually appealing output that extends the structure of a data frame. Using the matrix-style output of {migrate}'s functions with a visual formatting package such as the two mentioned above will hopefully help analysts streamline the presentation of their credit portfolio's state migration matrices to an audience. ## Getting Started If you haven't done so already, first install {migrate} with the instructions in the [README section](https://github.com/ketchbrookanalytics/migrate#Installation). First, load the package using `library()` ```{r load, eval = FALSE} library(migrate) ``` The package has a built-in mock dataset, which can be loaded into the environment like so: ```{r data, results = 'hide'} data("mock_credit") head(mock_credit[order(mock_credit$customer_id), ]) # sort by 'customer_id' ``` ```{r data_tbl, echo = FALSE} head(mock_credit[order(mock_credit$customer_id), ]) |> knitr::kable(row.names = FALSE) ``` Note that an important feature of the `mock_credit` dataset is that there are exactly two (2) unique values in the `date` column variable; if the `time` argument passed to `migrate()` has more than two (2) unique values, the function will throw an error. ```{r dates} unique(mock_credit$date) ``` To summarize the migration within the data, use the `migrate()` function ```{r migrate} migrated_df <- migrate( data = mock_credit, id = customer_id, time = date, state = risk_rating, ) head(migrated_df) ``` To create the state transition matrix, use the `build_matrix()` function ```{r matrix} build_matrix(migrated_df) ``` Or, to do it all in one shot, use the `|>` ```{r pipe} mock_credit |> migrate( id = customer_id, time = date, state = risk_rating, metric = principal_balance, percent = FALSE, verbose = FALSE ) |> build_matrix( state_start = risk_rating_start, state_end = risk_rating_end, metric = principal_balance ) ``` ## Handle IDs with observations at a single timepoint The following code creates a dataframe that features 500 customers with the following characteristics: - 470 customers have a value at both timepoints - 20 customers have a value only at the first timepoint - 10 customers have a value only at the second timepoint ```{r} mock_credit_with_missing <- mock_credit |> # Remove the value at the first timepoint for 10 customers dplyr::slice(-(1:10)) |> # Remove the value at the last timepoint for 20 customers dplyr::slice(-((dplyr::n() - 19):dplyr::n())) ``` Check that the new dataframe has information about 500 customers: ```{r} # Number of unique customer_id values in mock_credit_with_missing dplyr::n_distinct(mock_credit_with_missing$customer_id) ``` By default, `migrate()` drops observations that belong to IDs found at a single timepoint. `migrate()` informs such behavior through a warning: ```{r} migrated_data_without_fill_state <- mock_credit_with_missing |> migrate( id = customer_id, time = date, state = risk_rating, percent = FALSE, verbose = FALSE ) ``` Notice that only 470 customers have been migrated: ```{r} migrated_data_without_fill_state |> dplyr::pull(count) |> sum() ``` You can use `migrate()`'s `fill_state` argument to ensure that no information is lost during the migration process. When a *filler state* value (e.g., a character string such as "No Rating" or "NR") is assigned to `fill_state`, IDs with a single timepoint are not removed but rather migrated from or to this *filler state*. When `verbose = TRUE` a message will provide additional information about the IDs with missing timepoints: ```{r} migrated_data_with_fill_state <- mock_credit_with_missing |> migrate( id = customer_id, time = date, state = risk_rating, fill_state = "No Rating", percent = FALSE, verbose = TRUE ) ``` Check that 500 customers were migrated: ```{r} migrated_data_with_fill_state |> dplyr::pull(count) |> sum() ``` So far we have been using `count` as the metric to easily determine the amount of customers that migrated in each scenario. The following code provides an example migration that leverages `principal_balance` as the metric: ```{r} mock_credit_with_missing |> migrate( id = customer_id, time = date, state = risk_rating, metric = principal_balance, fill_state = "No Rating", percent = FALSE, verbose = FALSE ) |> build_matrix( state_start = risk_rating_start, state_end = risk_rating_end, metric = principal_balance ) ```