How to Use Map from Purrr with Dplyr::Mutate to Create Multiple New Columns Based on Column Pairs

How to use map from purrr with dplyr::mutate to create multiple new columns based on column pairs

Here is one option with purrr. We get the unique prefix of the names of the dataset ('nm1'), use map (from purrr) to loop through the unique names, select the column that matches the prefix value of 'nm1', add the rows using reduce and the bind the columns (bind_cols) with the original dataset

library(tidyverse)
nm1 <- names(df) %>%
substr(1, 1) %>%
unique
nm1 %>%
map(~ df %>%
select(matches(.x)) %>%
reduce(`+`)) %>%
set_names(paste0("sum_", nm1)) %>%
bind_cols(df, .)
# a1 b1 c1 a2 b2 c2 sum_a sum_b sum_c
#1 1 4 10 9 3 15 10 7 25
#2 2 5 11 10 4 16 12 9 27
#3 3 6 12 11 5 17 14 11 29
#4 4 7 13 12 6 18 16 13 31
#5 5 8 14 13 7 19 18 15 33

Apply function to two columns at a time with purrr

We can use pivot_longer to reshape the data, creating a column for each level of ko. Compute the sum, then pivot_wider to get back to your original format:

library(tidyverse)

df %>%
mutate(idx = row_number()) %>%
pivot_longer(-idx, names_sep = '_', names_to = c('group', 'ko')) %>%
pivot_wider(names_from = group, values_from = value) %>%
mutate(sum = l2fc + ctrl) %>%
pivot_wider(names_from = ko, values_from = c(l2fc, ctrl, sum))

idx l2fc_ko1 l2fc_ko2 ctrl_ko1 ctrl_ko2 sum_ko1 sum_ko2
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 -1.04 -0.710 -0.288 -1.65 -1.33 -2.36
2 2 0.0338 0.400 -0.850 0.319 -0.816 0.719
3 3 2.08 0.723 0.325 0.314 2.40 1.04
4 4 0.740 -0.411 -0.307 1.77 0.433 1.36
5 5 0.347 -1.57 -0.153 0.657 0.195 -0.915
6 6 -0.998 -0.145 0.265 -1.95 -0.733 -2.09
7 7 2.05 -0.0876 -0.909 -0.190 1.14 -0.278
8 8 0.0735 -0.134 -2.04 -0.832 -1.96 -0.966
9 9 1.52 2.37 1.53 -0.596 3.05 1.78
10 10 1.42 -0.753 -1.61 1.84 -0.194 1.09

How can I use map* and mutate to convert a list into a set of additional columns?

Using !! and := lets you dynamically name columns. Then, we can reduce the list output of map() with reduce(), which left_joins() all the dataframes in the list using the dataset title and id columns.

df_2 <- 
map(get_concept_list(df),
~ mutate(df,
!!.x := get_concept_info(df, .x))) %>%
reduce(left_join, by = c("dataset_title", "dataset_id"))

df_2

# A tibble: 1 x 6
dataset_title dataset_id time gender c_age measures
<chr> <chr> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>> <list<df[,2]>>
1 Population estimates - local authority based by single year NM_2002_1 [28 x 2] [3 x 2] [121 x 2] [2 x 2]

How to create multiple columns from combination of pairs of another column?

We may use combn on the levels of the 'marital'

library(dplyr)
library(stringr)
data2 <- combn(levels(data$marital), 2, FUN = \(x)
data %>%
transmute(!! str_c(x, collapse="_") :=
case_when(as.character(marital) %in% x~ as.character(marital))),
simplify = FALSE) %>%
bind_cols(data, .)

-output

> head(data2, 2)
# A tibble: 2 × 24
year marital age race rincome partyid relig denom tvhours `No answer_Neve… `No answer_Sepa… `No answer_Divo… `No answer_Wido… `No answer_Marr… `Never married_…
<int> <fct> <int> <fct> <fct> <fct> <fct> <fct> <int> <chr> <chr> <chr> <chr> <chr> <chr>
1 2000 Divorced 48 White $8000 to 9999 Not st… Prot… Bapt… NA <NA> <NA> Divorced <NA> <NA> <NA>
2 2000 Divorced 25 White Not applicable Not st… None Not … 1 <NA> <NA> Divorced <NA> <NA> <NA>
# … with 9 more variables: Never married_Divorced <chr>, Never married_Widowed <chr>, Never married_Married <chr>, Separated_Divorced <chr>,
# Separated_Widowed <chr>, Separated_Married <chr>, Divorced_Widowed <chr>, Divorced_Married <chr>, Widowed_Married <chr>

purrr approach for creating new columns through function with two arguments


  • You should not use df %>% map2(...) if you are passing .x, .y separately to map2.
  • is.na(.x) is not correct since .x is character values (like "x", "y" and "z"). I have used df[[.x]] to subset the values.
  • Since we are not using df %>% ... so row_number() would not work, hence changed it to seq_along.

Here is an approach with map2_dfc to create new columns and we use bind_cols to bind it to original dataframe.

library(dplyr)
library(purrr)

bind_cols(df, map2_dfc(.x = c("x", "y", "z"),
.y = sampling,
.f = ~tibble(!!paste0(.x, "_random") :=
if_else(is.na(df[[.x]]), NA_integer_,
as.integer(seq_along(df[[.x]]) %in% sample(which(!is.na(df[[.x]])), size = .y))))))

# x y z x_random y_random z_random
#1 NA 0.02358698 0.222022714 NA 0 1
#2 0.15099912 NA 0.878007560 0 NA 0
#3 0.20228598 0.92222805 NA 0 0 NA
#4 0.10955137 0.68713928 0.485866574 1 1 1
#5 0.57361508 0.56205208 0.367087414 1 1 0
#6 0.30534642 0.75997029 0.006055428 0 0 1
#7 0.76949447 0.78142772 0.279323093 0 0 0
#8 0.07178739 0.73181961 0.187739444 0 0 1
#9 0.52645525 0.48321814 0.213029355 0 1 0
#10 0.30858707 0.20973381 0.450931534 0 0 0

Using purrr::possibly to catch dynamic error messages

You could tweak purrr::possibly() from its original code to return instead of message the error.

Original code:

## > possibly
## function (.f, otherwise, quiet = TRUE)
## {
## .f <- as_mapper(.f)
## force(otherwise)
## function(...) {
## tryCatch(.f(...), error = function(e) {
## if (!quiet)
## message("Error: ", e$message) ## <--- tweak
## otherwise
## }, interrupt = function(e) {
## stop("Terminated by user", call. = FALSE)
## })
## }
## }

tweaked function:

possibly2 <- function (.f, otherwise, quiet = TRUE) {
.f <- as_mapper(.f)
force(otherwise)
function(...) {
tryCatch(.f(...), error = function(e) {
if (!quiet)
return(e$message) ## <-- tweaked
otherwise
}, interrupt = function(e) {
stop("Terminated by user", call. = FALSE)
})
}
}

Example:

safer_foo <- possibly2(.f = foo, otherwise = "error",
quiet = FALSE ## don't forget to "unquiet"
)

## all other objects / code as in your example

Output:

## > output_list
## $A
## # A tibble: 1 x 1
## sum
## <int>
## 1 55
##
## $B
## [1] "No column with name B found."
##
## $C
## [1] "All values of column C are missing."

edit

Actually, possibly2 carries over code which is no longer needed. Omitting
the unwanted static arguments otherwise and quiet, and skipping the handler for user interrupts, the required code shrinks down to:

possibly2 <- function (.f) {
.f <- as_mapper(.f)
function(...) {
tryCatch(.f(...), error = function(e) e$message)
}
}


Related Topics



Leave a reply



Submit