Using Pmap with C(...) Part 2

Using pmap with c(...) part 2

The issue seems to be mixing the custom anonymous/lambda function (function(Weight, Days, ...) - where the arguments are named as the same as the column name) with the default lambda function (~ - where the arguments are .x, .y if only two elements or if more than two - ..1, ..2, ..3 etc). In the OP's code

library(dplyr)
library(purrr)
df %>%
mutate(pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days))))

The 'Weight', 'Days' returns the full column values from original dataset and not from rows. If we want to still make use of the above command, we need to convert the data captured in each row to a tibble and use with

df %>%
pmap_dfr(., ~ with(as_tibble(list(...)),
setNames(rep(Weight, Days), seq_len(Days))))
# A tibble: 3 x 7
# `1` `2` `3` `4` `5` `6` `7`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 0.03 0.03 0.03 0.03 0.03 NA NA
#2 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 0.04 0.04 0.04 NA NA NA NA

If we want the other columns,

df %>%
pmap_dfr(., ~ c(list(...)[-(3:4)], with(as_tibble(list(...)),
setNames(rep(Weight, Days), seq_len(Days)))))
# A tibble: 3 x 9
# Name School `1` `2` `3` `4` `5` `6` `7`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
#2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA

Or use rowwise

library(tidyr)
df %>%
rowwise %>%
mutate(out = list(setNames(rep(Weight, Days), seq_len(Days)))) %>%
ungroup %>%
unnest_wider(c(out)) %>%
select(-Weight, -Days)
# A tibble: 3 x 9
# Name School `1` `2` `3` `4` `5` `6` `7`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
#2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA

Using pmap with a to apply different regular expressions to different variables in a tibble?

Here are two tidyverse ways. One is similar to the data.table answer, in that it involves reshaping the data, joining it with the configs, and reshaping back to wide. The other is purrr-based and, in my opinion, a little bit of a weird approach. I'd recommend the first, since it feels more intuitive.

Use tidyr::gather to make the data long-shaped, then dplyr::left_join to make sure that every text value from test_target has a corresponding pattern & replacement—even the cases (col5) without patterns will be retained by using a left join.

library(tidyverse)
...

test_target %>%
gather(key = col, value = text) %>%
left_join(test_config, by = c("col" = "string_col"))
#> # A tibble: 25 x 4
#> col text pattern replacement
#> <chr> <chr> <chr> <chr>
#> 1 col1 Foo "^\\.$" ""
#> 2 col1 bar "^\\.$" ""
#> 3 col1 . "^\\.$" ""
#> 4 col1 NA "^\\.$" ""
#> 5 col1 NULL "^\\.$" ""
#> 6 col2 Foo ^NA$ ""
#> 7 col2 bar ^NA$ ""
#> 8 col2 . ^NA$ ""
#> 9 col2 NA ^NA$ ""
#> 10 col2 NULL ^NA$ ""
#> # ... with 15 more rows

Using an ifelse replace the pattern where a pattern exists, or keep the original text if the pattern doesn't. Keep just the necessary patterns, add a row number because spread needs unique IDs, and make the data wide again.

test_target %>%
gather(key = col, value = text) %>%
left_join(test_config, by = c("col" = "string_col")) %>%
mutate(new_text = ifelse(is.na(pattern), text, str_replace(text, pattern, replacement))) %>%
select(col, new_text) %>%
group_by(col) %>%
mutate(row = row_number()) %>%
spread(key = col, value = new_text) %>%
select(-row)
#> # A tibble: 5 x 5
#> col1 col2 col3 col4 col5
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Foo Foo Foo NULL I
#> 2 bar bar bar NA am
#> 3 "" . . Foo not
#> 4 NA "" NA . changing
#> 5 NULL NULL "" bar .

The second way is to make a tiny tibble of just the column names, join that with the configs, and split into a list of lists. Then purrr::map2_dfc maps over both this list you've created and the columns of test_target, and returns a data frame by cbinding. The reason this works is that data frames are technically lists of columns, so if you map over a data frame, you're treating each column like a list item. I couldn't get a ifelse to work right here—something in the logic had only single strings coming back instead of the whole vector.

tibble(all_cols = names(test_target)) %>%
left_join(test_config, by = c("all_cols" = "string_col")) %>%
split(.$all_cols) %>%
map(as.list) %>%
map2_dfc(test_target, function(info, text) {
if (is.na(info$pattern)) {
text
} else {
str_replace(text, info$pattern, info$replacement)
}
})
#> # A tibble: 5 x 5
#> col1 col2 col3 col4 col5
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Foo Foo Foo NULL I
#> 2 bar bar bar NA am
#> 3 "" . . Foo not
#> 4 NA "" NA . changing
#> 5 NULL NULL "" bar .

Created on 2018-10-30 by the reprex package (v0.2.1)

Using pmap on a subset of columns

The columns 'x', 'left' and 'right' are list columns with each element length > 1, but between 'left' and 'right' argument takes only a vector of length 1. In order to make it, we can either expand the dataset by unnesting the list elements and then apply the pmap

params %>%
mutate(x = pmap(., rbinom)) %>%
select(-n) %>%
mutate(left = map2(x, size, ~
binom.agresti.coull(.x, .y)$lower),
right = map2(x,size, ~
binom.agresti.coull(.x,.y)$upper)) %>%
unnest %>%
mutate(coverage = pmap_lgl(list(x, left, right), between))

Or instead of using between we can use the comparison operators while keeping the list structure

params %>% 
mutate(x = pmap(., rbinom)) %>%
select(-n) %>%
mutate(left = map2(x, size, ~
binom.agresti.coull(.x, .y)$lower),
right = map2(x,size, ~
binom.agresti.coull(.x,.y)$upper),
coverage = pmap(list(x, left, right), ~ ..1 > ..2 & ..1 < ..3)

Using pmap to apply different regular expressions to different variables in a tibble?

You don't need to create a function (your function is actually the source of the problem): you can use str_replace_all directly.

pmap_dfr(
list(test_target,
test_config$pattern,
test_config$replacement),
str_replace_all
)

# A tibble: 5 x 4
col1 col2 col3 col4
<chr> <chr> <chr> <chr>
1 Foo Foo Foo NULL
2 bar bar bar NA
3 "" . . Foo
4 NA "" NA .
5 NULL NULL "" bar

Create new dataframe based on sequential row values

Maybe this is what you are looking for:

  1. Reshape to long format using e.g. tidy::pivot_longer
  2. Group by Name and make use of dplyr::lag to add an indicator whether a person is new
  3. Summarise by year
d <- data.frame(
Name = c("Terry", "Sam", "Nic", "Sarah"),
x2017 = c(1, 0, 0, 0),
x2018 = c(1, 0, 1, 1),
x2019 = c(0, 1, 1, 1)
)

library(dplyr)
library(tidyr)

d %>%
tidyr::pivot_longer(-Name, names_to = "year") %>%
mutate(year = gsub("^x", "", year)) %>%
group_by(Name) %>%
mutate(new = as.numeric(value == 1 & lag(value) == 0),
new = ifelse(is.na(new), value, new)) %>%
ungroup() %>%
group_by(year) %>%
summarise(total = sum(value), new = sum(new))
#> # A tibble: 3 x 3
#> year total new
#> <chr> <dbl> <dbl>
#> 1 2017 1 1
#> 2 2018 3 2
#> 3 2019 3 1

purrr pmap function arguments

The misunderstanding here is that your 3rd and 4th options do not have "named arguments" but default argument values. You are supplying a function definition to the .f argument of pmap, not a function call.

pmap is doing partial argument matching the same way that base R does. It may make this clearer to turn on options(warnPartialMatchArgs = TRUE). Here I'll take your 3rd example, factoring the function definition out to make it clearer what is happening:

iter_tibble <- tibble::tibble(
a = 1:2,
b = 3:4,
c = 7:6
)

f3 <- function(a1 = a, b1 = b, c1 = c) {
paste(a1, b1, c1)
}

purrr::pmap(iter_tibble, f3)
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'a' to 'a1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'b' to 'b1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'c' to 'c1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'a' to 'a1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'b' to 'b1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'c' to 'c1'
#> [[1]]
#> [1] "1 3 7"
#>
#> [[2]]
#> [1] "2 4 6"

This is exactly the same as the case with regular R functions that you describe, where supplied named arguments can be abbreviations of the function arguments. To put it another way, for the first row of the table, pmap basically constructs the call f3(a = 1, b = 3, c = 7). a, b, and c come from the column names, the values come from the row.

When trying to evaluate this call, we see that the function f3 does not have an argument a, but it has an argument a1. So the named argument a = 1 in the call is partially matched to a1 in the function definition. That's what the partial match warnings describe in the output. No "extensions" are happening. The fact that the argument a1 has a default value of a is totally irrelevant here.

If you want to call a function and pass values in the tibble to differently named arguments, use a wrapper around it to change the interface. You can do this with a separate named function or (as is very common) using the ~ anonymous function syntax. Using your 5th example:

iter_tibble <- tibble::tibble(
a = 1:2,
b = 3:4,
c = 7:6
)

f5 <- function(b1, obelix, c1) {
paste(b1, obelix, c1)
}

f5_wrapper <- function(a, b, c) {
f5(b1 = b, obelix = a, c1 = c)
}

purrr::pmap(iter_tibble, f5_wrapper)
#> [[1]]
#> [1] "3 1 7"
#>
#> [[2]]
#> [1] "4 2 6"
purrr::pmap(iter_tibble, ~ f5(b1 = ..2, obelix = ..1, c1 = ..3))
#> [[1]]
#> [1] "3 1 7"
#>
#> [[2]]
#> [1] "4 2 6"


Related Topics



Leave a reply



Submit