Checking existing pipelines
xmap
offers verification functions for mappings encoded
in named vectors or lists as well as data frames. Many of these checks
will seem trivial in short mappings, but can be useful checks for longer
or more complex mappings.
For example, if you are using a named list to manually recode
categorical variables (e.g. using forcats::fct_recode
),
then you might want to verify some properties of the level mappings –
e.g. each existing level is only mapped once:
library(forcats)
library(xmap)
x <- factor(c("apple", "bear", "banana", "dear"))
levels <- ## new = "old" levels
c(fruit = "apple", fruit = "banana") |>
verify_named_all_values_unique()
forcats::fct_recode(x, !!!levels)
## [1] fruit bear fruit dear
## Levels: fruit bear dear
mistake_levels <- c(fruit = "apple", fruit = "banana", veg = "banana") |>
verify_named_all_values_unique()
## Error in `verify_named_all_values_unique()`:
## ! Duplicated values found in `x`.
## ℹ Use `base::duplicated(unlist(unname(x)))` to identify duplicates.
Similarly, if you are using a named list to encode group members, you could check the members against a reference list.
## mistakenly assign kate to another group
student_group_mistake <- list(GRP1 = c("kate", "jane", "peter"),
GRP2 = c("terry", "ben", "grace"),
GRP2 = c("cindy", "lucy", "kate" ))
student_list <- c("kate", "jane", "peter", "terry", "ben",
"grace", "cindy", "lucy", "alex")
student_group_mistake |>
verify_named_matchset_values_exact(student_list)
## Error in `verify_named_matchset_values_exact()`:
## ! The values of `x` do not exactly match `ref_set`
If you are redistributing numeric values between categories, see
vignette("making-xmaps")
for details on how to check your
transformation weights redistribute exactly 100% of your original
data.
Crossmap transformations
All crossmaps transformations can be decomposed into multiple “standard” data manipulation steps.
- Rename original categories into target categories
- Mutate source node values by link weight.
- Summarise mutated values by target node.
To implement these steps use the following dplyr
pipeline with verified links:
-
dplyr::left_join
the target categories (to
) to the original data via the source labels (from
). -
dplyr::mutate
the source values by multiplying them with the link weights (weights
) to transform the original values into redistributed values. -
dplyr::group_by
anddplyr::summarise
values by target groups (to
) to complete any many-to-1 mappings.
For example given some original data (v19_data
) with
source categories (v19
), and a valid xmap
with
source categories (from = version19
), target categories
(to = version18
) and link weights
(weights = w19to18
):
# original data
v19_data <- tibble::tribble(
~v19, ~count,
"1120", 300,
"1121", 400,
"1130", 200,
"1200", 600
)
# valid crossmap
xmap_19to18 <- tibble::tribble(
~version19, ~version18, ~w19to18,
# many-to-1 collapsing
"1120", "A2", 1,
"1121", "A2", 1,
# 1-to-1 recoding
"1130", "A3", 1,
# 1-to-many redistribution
"1200", "A4", 0.6,
"1200", "A5", 0.4
) |>
verify_links_as_xmap(from = version19, to = version18, weights = w19to18)
# transformed data
(v18_data <- dplyr::left_join(x = v19_data,
y = xmap_19to18,
by = c(v19 = "version19")) |>
dplyr::mutate(new_count = count * w19to18) |>
dplyr::group_by(version18) |>
dplyr::summarise(v20_count = sum(new_count))
)
## # A tibble: 4 × 2
## version18 v20_count
## <chr> <dbl>
## 1 A2 700
## 2 A3 200
## 3 A4 360
## 4 A5 240
Note that we expect multiple matches for 1-to-many relations so the
dplyr
warning can safely be ignored.
Creating an xmap_df
object (EXPERIMENTAL)
The xmap_df
class aims to facilitates additional
functionality such as graph property calculation and printing
(i.e. relation types and crossmap direction) via a custom
print()
method, coercion to other useful classes
(e.g. xmap_to_matrix()
), visualisation (see
vignette("vis-xmaps")
for prototypes), and multi-step
transformations (i.e. from nomenclature A to B to C).
Please note that the xmap_df
class and related functions
are still in active development and subject to breaking changes in
future releases.
If you already have a data frame of candidate links, turn them into a
valid xmap
object by specifying the source
(from
) nodes, target (to
) nodes, and weights
(weights
):
uk_shares <- tibble::tribble(
~key1, ~key2, ~shares,
"UK, Channel Islands, Isle of Man", "Scotland", 0.1102047,
"UK, Channel Islands, Isle of Man", "Wales", 0.02720333,
"UK, Channel Islands, Isle of Man", "England", 0.862592
)
uk_shares |>
as_xmap_df(from = key1, to = key2, weights = shares, tol = 3e-08) |>
print()
## Dropping any additional columns in `uk_shares`
## ℹ To silence set `.drop_extra = TRUE`
## xmap_df:
## split
## (key1 -> key2) BY shares
## key1 key2 shares
## 1 UK, Channel Islands, Isle of Man Scotland 0.11020470
## 2 UK, Channel Islands, Isle of Man Wales 0.02720333
## 3 UK, Channel Islands, Isle of Man England 0.86259200
Note that both verification and coercion will fail without adjusting
the tolerance tol
, which specifies differences to ignore
(i.e. what counts as close enough to 1).
uk_shares |>
verify_links_as_xmap(key1, key2, shares)
## Error in `verify_links_as_xmap()`:
## ! Incomplete mapping weights found
## ✖ `shares` does not sum to 1
## ℹ Modify weights or adjust `tol` and try again.