6 Validation Helper Functions
6.1 Panel Map
6.1.1 Generate Equal Weights
In the most simple case, to make a panel map, we need only a correspondence between a source nomenclature (std_A
) and target nomenclature (std_B
), which doesn’t have any duplicate rows.
6.1.1.1 Functions
This is a helper function for making a valid Panel Map with equal Mapping Weights from a concordance table.
#' Helper to build equal split panel map
#'
#' Generate panel map using all *distinct* correspondences between two classifications.
#'
#' @param code_dict Data frame containing correspondence between source and destination codes
#' @inheritParams check_weights
#' @param .weights_to (optional) new column name for storing weights that will be applied to. The default name is `split_<<code_in>>`.
#' input values.
#'
#' @return Returns panel map as tibble
#' @export
#'
#' @examples
make_pm_equal <- function(code_dict, code_in, code_out, .weights_to = NULL){
## check and remove for duplicates
<- sum(duplicated(code_dict))
n_dups <- n_dups == 0
no_dup_links if (!no_dup_links) {
message("Removing duplicate code_in/code_out rows")
<- code_dict |>
code_dict ::distinct({{code_in}}, {{code_out}})
dplyr
}
## make column name for weights
<- .weights_to %||% paste("split", deparse(substitute(code_in)), sep = "_")
.weights_to
## make panel map
<- code_dict |>
panel_map ::group_by({{code_in}}) |>
dplyr::mutate(n_dest = dplyr::n(),
dplyr!!.weights_to := 1 / n_dest) |>
::ungroup() |>
dplyr::select(-n_dest)
dplyr
return(panel_map)
}
#' @rdname make_pm_equal
#' @export
<- make_pm_equal make_panel_map_equal
Use this helper on the concordance table defined above:
make_pm_equal(codes_BA, std_A, std_B, "weights")
## # A tibble: 10 × 3
## std_B std_A weights
## <chr> <chr> <dbl>
## 1 A1 x1111 1
## 2 B2 x2222 1
## 3 B2 x3333 1
## 4 C3 x4444 0.25
## 5 C4 x4444 0.25
## 6 C4 x6666 1
## 7 C5 x4444 0.25
## 8 C6 x4444 0.25
## 9 C7 x5555 0.5
## 10 C8 x5555 0.5
This function uses the no_dup_links
flag to removes any duplicate instructions/links, to avoid assigning unequal shares to each target code/category (shown as naive_share
):
library(dplyr)
<- tribble(~code_in, ~code_out,
codes "cake", "piece_01",
"cake", "piece_02",
"cake", "piece_03",
"cake", "piece_03" ## duplicated row
)
|>
codes ## equal share by code_out
mutate(equal_share = 1 / n_distinct(code_out)) |>
## without duplicates removed
group_by(code_in) |>
mutate("n_dest" = n(),
weight := 1 / n_dest) |>
ungroup() |>
select(-n_dest) |>
group_by(code_out) |>
summarise(
weights = paste(weight, collapse = "+"),
naive_share = sum(weight),
equal_share = unique(equal_share)
)
## # A tibble: 3 × 4
## code_out weights naive_share equal_share
## <chr> <chr> <dbl> <dbl>
## 1 piece_01 0.25 0.25 0.333
## 2 piece_02 0.25 0.25 0.333
## 3 piece_03 0.25+0.25 0.5 0.333
6.1.1.2 Tests
::test_that(
testthat"make_pm_equal() works",
{::expect_identical(
testthatmake_pm_equal(equal_pm$codes_BA, std_A, std_B, .weights_to = "weight"), equal_pm$pm_BA)
::expect_no_message(
testthatmake_pm_equal(equal_pm$codes_BA, std_A, std_B, .weights_to = "weight"))
}
)::test_that(
testthat"make_pm_equal() handles duplicate link correctly",
{<- rbind(equal_pm$codes_BA, equal_pm$codes_BA[1, ])
dup_codes_BA ::expect_message(
testthatmake_pm_equal(dup_codes_BA, std_A, std_B)
)::expect_identical(
testthatmake_pm_equal(dup_codes_BA, std_A, std_B, .weights_to = "weight"), equal_pm$pm_BA
)
} )
## Test passed 🌈
## Removing duplicate code_in/code_out rows
## Test passed 🥳
6.2 Data Preparation for Concordance Transformation
Things that are checked for:
- missing values
code_in
andvalue_in
columns
Probably good practice things, but too much hassle to check, so maybe put in vignette?
data_in
should only have one row/obs percode_in
,- ideally
data_in
has only thecode_in
andvalues_in
columns… the rest get dropped (like withdplyr::summarise()
)