6 Validation Helper Functions
6.1 Panel Map
6.1.1 Generate Equal Weights
In the most simple case, to make a panel map, we need only a correspondence between a source nomenclature (std_A) and target nomenclature (std_B), which doesn’t have any duplicate rows.
6.1.1.1 Functions
This is a helper function for making a valid Panel Map with equal Mapping Weights from a concordance table.
#' Helper to build equal split panel map
#'
#' Generate panel map using all *distinct* correspondences between two classifications.
#'
#' @param code_dict Data frame containing correspondence between source and destination codes
#' @inheritParams check_weights
#' @param .weights_to (optional) new column name for storing weights that will be applied to. The default name is `split_<<code_in>>`.
#' input values.
#'
#' @return Returns panel map as tibble
#' @export
#'
#' @examples
make_pm_equal <- function(code_dict, code_in, code_out, .weights_to = NULL){
## check and remove for duplicates
n_dups <- sum(duplicated(code_dict))
no_dup_links <- n_dups == 0
if (!no_dup_links) {
message("Removing duplicate code_in/code_out rows")
code_dict <- code_dict |>
dplyr::distinct({{code_in}}, {{code_out}})
}
## make column name for weights
.weights_to <- .weights_to %||% paste("split", deparse(substitute(code_in)), sep = "_")
## make panel map
panel_map <- code_dict |>
dplyr::group_by({{code_in}}) |>
dplyr::mutate(n_dest = dplyr::n(),
!!.weights_to := 1 / n_dest) |>
dplyr::ungroup() |>
dplyr::select(-n_dest)
return(panel_map)
}
#' @rdname make_pm_equal
#' @export
make_panel_map_equal <- make_pm_equalUse this helper on the concordance table defined above:
make_pm_equal(codes_BA, std_A, std_B, "weights")## # A tibble: 10 × 3
## std_B std_A weights
## <chr> <chr> <dbl>
## 1 A1 x1111 1
## 2 B2 x2222 1
## 3 B2 x3333 1
## 4 C3 x4444 0.25
## 5 C4 x4444 0.25
## 6 C4 x6666 1
## 7 C5 x4444 0.25
## 8 C6 x4444 0.25
## 9 C7 x5555 0.5
## 10 C8 x5555 0.5
This function uses the no_dup_links flag to removes any duplicate instructions/links, to avoid assigning unequal shares to each target code/category (shown as naive_share):
library(dplyr)
codes <- tribble(~code_in, ~code_out,
"cake", "piece_01",
"cake", "piece_02",
"cake", "piece_03",
"cake", "piece_03" ## duplicated row
)
codes |>
## equal share by code_out
mutate(equal_share = 1 / n_distinct(code_out)) |>
## without duplicates removed
group_by(code_in) |>
mutate("n_dest" = n(),
weight := 1 / n_dest) |>
ungroup() |>
select(-n_dest) |>
group_by(code_out) |>
summarise(
weights = paste(weight, collapse = "+"),
naive_share = sum(weight),
equal_share = unique(equal_share)
)## # A tibble: 3 × 4
## code_out weights naive_share equal_share
## <chr> <chr> <dbl> <dbl>
## 1 piece_01 0.25 0.25 0.333
## 2 piece_02 0.25 0.25 0.333
## 3 piece_03 0.25+0.25 0.5 0.333
6.1.1.2 Tests
testthat::test_that(
"make_pm_equal() works",
{
testthat::expect_identical(
make_pm_equal(equal_pm$codes_BA, std_A, std_B, .weights_to = "weight"), equal_pm$pm_BA)
testthat::expect_no_message(
make_pm_equal(equal_pm$codes_BA, std_A, std_B, .weights_to = "weight"))
}
)
testthat::test_that(
"make_pm_equal() handles duplicate link correctly",
{
dup_codes_BA <- rbind(equal_pm$codes_BA, equal_pm$codes_BA[1, ])
testthat::expect_message(
make_pm_equal(dup_codes_BA, std_A, std_B)
)
testthat::expect_identical(
make_pm_equal(dup_codes_BA, std_A, std_B, .weights_to = "weight"), equal_pm$pm_BA
)
}
)## Test passed 🌈
## Removing duplicate code_in/code_out rows
## Test passed 🥳
6.2 Data Preparation for Concordance Transformation
Things that are checked for:
- missing values
code_inandvalue_incolumns
Probably good practice things, but too much hassle to check, so maybe put in vignette?
data_inshould only have one row/obs percode_in,- ideally
data_inhas only thecode_inandvalues_incolumns… the rest get dropped (like withdplyr::summarise())