Skip to contents

Checking existing pipelines

xmap offers verification functions for mappings encoded in named vectors or lists as well as data frames. Many of these checks will seem trivial in short mappings, but can be useful checks for longer or more complex mappings.

For example, if you are using a named list to manually recode categorical variables (e.g. using forcats::fct_recode), then you might want to verify some properties of the level mappings – e.g. each existing level is only mapped once:

library(forcats)
library(xmap)

x <- factor(c("apple", "bear", "banana", "dear"))
levels <- ## new = "old" levels
  c(fruit = "apple", fruit = "banana") |>
  verify_named_all_values_unique()
forcats::fct_recode(x, !!!levels)
## [1] fruit bear  fruit dear 
## Levels: fruit bear dear
mistake_levels <- c(fruit = "apple", fruit = "banana", veg = "banana") |>
  verify_named_all_values_unique()
## Error in `verify_named_all_values_unique()`:
## ! Duplicated values found in `x`.
##  Use `base::duplicated(unlist(unname(x)))` to identify duplicates.

Similarly, if you are using a named list to encode group members, you could check the members against a reference list.

## mistakenly assign kate to another group
student_group_mistake <- list(GRP1 = c("kate", "jane", "peter"),
                           GRP2 = c("terry", "ben", "grace"),
                           GRP2 = c("cindy", "lucy", "kate" ))

student_list <- c("kate", "jane", "peter", "terry", "ben",
                  "grace", "cindy", "lucy", "alex")

student_group_mistake |>
  verify_named_matchset_values_exact(student_list)
## Error in `verify_named_matchset_values_exact()`:
## ! The values of `x` do not exactly match `ref_set`

If you are redistributing numeric values between categories, see vignette("making-xmaps") for details on how to check your transformation weights redistribute exactly 100% of your original data.

Crossmap transformations

All crossmaps transformations can be decomposed into multiple “standard” data manipulation steps.

  1. Rename original categories into target categories
  2. Mutate source node values by link weight.
  3. Summarise mutated values by target node.

To implement these steps use the following dplyr pipeline with verified links:

  1. dplyr::left_join the target categories (to) to the original data via the source labels (from).
  2. dplyr::mutate the source values by multiplying them with the link weights (weights) to transform the original values into redistributed values.
  3. dplyr::group_by and dplyr::summarise values by target groups (to) to complete any many-to-1 mappings.

For example given some original data (v19_data) with source categories (v19), and a valid xmap with source categories (from = version19), target categories (to = version18) and link weights (weights = w19to18):

# original data
v19_data <- tibble::tribble(
  ~v19, ~count,
  "1120", 300,
  "1121", 400,
  "1130", 200,
  "1200", 600
)

# valid crossmap
xmap_19to18 <- tibble::tribble(
  ~version19, ~version18, ~w19to18,
  # many-to-1 collapsing
  "1120", "A2", 1,
  "1121", "A2", 1,
  # 1-to-1 recoding
  "1130", "A3", 1,
  # 1-to-many redistribution
  "1200", "A4", 0.6,
  "1200", "A5", 0.4
) |>
  verify_links_as_xmap(from = version19, to = version18, weights = w19to18)

# transformed data
(v18_data <- dplyr::left_join(x = v19_data,
                      y = xmap_19to18,
                      by = c(v19 = "version19")) |>
  dplyr::mutate(new_count = count * w19to18) |>
  dplyr::group_by(version18) |>
  dplyr::summarise(v20_count = sum(new_count))
)
## # A tibble: 4 × 2
##   version18 v20_count
##   <chr>         <dbl>
## 1 A2              700
## 2 A3              200
## 3 A4              360
## 4 A5              240

Note that we expect multiple matches for 1-to-many relations so the dplyr warning can safely be ignored.

Creating an xmap_df object (EXPERIMENTAL)

The xmap_df class aims to facilitates additional functionality such as graph property calculation and printing (i.e. relation types and crossmap direction) via a custom print() method, coercion to other useful classes (e.g. xmap_to_matrix()), visualisation (see vignette("vis-xmaps") for prototypes), and multi-step transformations (i.e. from nomenclature A to B to C).

Please note that the xmap_df class and related functions are still in active development and subject to breaking changes in future releases.

If you already have a data frame of candidate links, turn them into a valid xmap object by specifying the source (from) nodes, target (to) nodes, and weights (weights):

uk_shares <- tibble::tribble(
  ~key1, ~key2, ~shares,
  "UK, Channel Islands, Isle of Man", "Scotland", 0.1102047,
  "UK, Channel Islands, Isle of Man", "Wales", 0.02720333,
  "UK, Channel Islands, Isle of Man", "England", 0.862592
  )

uk_shares |>
  as_xmap_df(from = key1, to = key2, weights = shares, tol = 3e-08) |>
  print()
## Dropping any additional columns in `uk_shares`
##  To silence set `.drop_extra = TRUE`
## xmap_df:
## split 
## (key1 -> key2) BY shares
##                               key1     key2     shares
## 1 UK, Channel Islands, Isle of Man Scotland 0.11020470
## 2 UK, Channel Islands, Isle of Man    Wales 0.02720333
## 3 UK, Channel Islands, Isle of Man  England 0.86259200

Note that both verification and coercion will fail without adjusting the tolerance tol, which specifies differences to ignore (i.e. what counts as close enough to 1).

uk_shares |>
  verify_links_as_xmap(key1, key2, shares)
## Error in `verify_links_as_xmap()`:
## ! Incomplete mapping weights found
##  `shares` does not sum to 1
##  Modify weights or adjust `tol` and try again.