There are lots of ways in R to modify categorical variables and to redistribution numeric values between categories. If you are performing a simple recoding of category labels, or collapsing multiple categories, you might use the
forcats package. However, if you are doing more complex transformations you might find yourself writing custom functions or scripts including mutating joins, grouped summary operations, or case-wise transformation of numeric values. Verifying these data wrangling scripts and pipelines becomes more difficult as the number of categories and the complexity of the mappings increases. Current solutions to this issue mostly involve ad hoc data validation of the data before and after they are transformed (
xmap package offers an alternative approach to ensuring your code performs the intended transformations. Instead of inspecting the data, the package provides tools for validating the mapping objects which are used to transform the data. Examples of mapping objects and available verification functions include:
- Named vectors or lists
- Also known as crosswalks and concordance tables
verify_pairs()for checking uniqueness and 1-to-1 relations.
- a new graph-based extension of Crosswalk tables that also store redistribution weights for ambiguous 1-to-many relations.
verify_links_as_xmap()to check aggregation or disaggregation weights and other desirable properties.
vignette("xmap") to get started using verification functions in your existing workflows. The functions are based on results obtained by representing and analysing recoding or redistribution transformations as directed, weighted bipartite graphs (i.e. “Crossmaps”). For more information about this underlying graph structure, and the experimental
xmap_df class, see