Using pmap with c(...) part 2
The issue seems to be mixing the custom anonymous/lambda function (function(Weight, Days, ...)
- where the arguments are named as the same as the column name) with the default lambda function (~
- where the arguments are .x
, .y
if only two elements or if more than two - ..1
, ..2
, ..3
etc). In the OP's code
library(dplyr)
library(purrr)
df %>%
mutate(pmap_dfr(., ~ c(..., setNames(rep(Weight, Days), 1:Days))))
The 'Weight', 'Days' returns the full column values from original dataset and not from rows. If we want to still make use of the above command, we need to convert the data captured in each row to a tibble
and use with
df %>%
pmap_dfr(., ~ with(as_tibble(list(...)),
setNames(rep(Weight, Days), seq_len(Days))))
# A tibble: 3 x 7
# `1` `2` `3` `4` `5` `6` `7`
# <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 0.03 0.03 0.03 0.03 0.03 NA NA
#2 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 0.04 0.04 0.04 NA NA NA NA
If we want the other columns,
df %>%
pmap_dfr(., ~ c(list(...)[-(3:4)], with(as_tibble(list(...)),
setNames(rep(Weight, Days), seq_len(Days)))))
# A tibble: 3 x 9
# Name School `1` `2` `3` `4` `5` `6` `7`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
#2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
Or use rowwise
library(tidyr)
df %>%
rowwise %>%
mutate(out = list(setNames(rep(Weight, Days), seq_len(Days)))) %>%
ungroup %>%
unnest_wider(c(out)) %>%
select(-Weight, -Days)
# A tibble: 3 x 9
# Name School `1` `2` `3` `4` `5` `6` `7`
# <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#1 Antoine Bach 0.03 0.03 0.03 0.03 0.03 NA NA
#2 Antoine Ken 0.02 0.02 0.02 0.02 0.02 0.02 0.02
#3 Barbara Franklin 0.04 0.04 0.04 NA NA NA NA
Using pmap with a to apply different regular expressions to different variables in a tibble?
Here are two tidyverse
ways. One is similar to the data.table
answer, in that it involves reshaping the data, joining it with the configs, and reshaping back to wide. The other is purrr
-based and, in my opinion, a little bit of a weird approach. I'd recommend the first, since it feels more intuitive.
Use tidyr::gather
to make the data long-shaped, then dplyr::left_join
to make sure that every text value from test_target
has a corresponding pattern & replacement—even the cases (col5) without patterns will be retained by using a left join.
library(tidyverse)
...
test_target %>%
gather(key = col, value = text) %>%
left_join(test_config, by = c("col" = "string_col"))
#> # A tibble: 25 x 4
#> col text pattern replacement
#> <chr> <chr> <chr> <chr>
#> 1 col1 Foo "^\\.$" ""
#> 2 col1 bar "^\\.$" ""
#> 3 col1 . "^\\.$" ""
#> 4 col1 NA "^\\.$" ""
#> 5 col1 NULL "^\\.$" ""
#> 6 col2 Foo ^NA$ ""
#> 7 col2 bar ^NA$ ""
#> 8 col2 . ^NA$ ""
#> 9 col2 NA ^NA$ ""
#> 10 col2 NULL ^NA$ ""
#> # ... with 15 more rows
Using an ifelse
replace the pattern where a pattern exists, or keep the original text if the pattern doesn't. Keep just the necessary patterns, add a row number because spread
needs unique IDs, and make the data wide again.
test_target %>%
gather(key = col, value = text) %>%
left_join(test_config, by = c("col" = "string_col")) %>%
mutate(new_text = ifelse(is.na(pattern), text, str_replace(text, pattern, replacement))) %>%
select(col, new_text) %>%
group_by(col) %>%
mutate(row = row_number()) %>%
spread(key = col, value = new_text) %>%
select(-row)
#> # A tibble: 5 x 5
#> col1 col2 col3 col4 col5
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Foo Foo Foo NULL I
#> 2 bar bar bar NA am
#> 3 "" . . Foo not
#> 4 NA "" NA . changing
#> 5 NULL NULL "" bar .
The second way is to make a tiny tibble of just the column names, join that with the configs, and split into a list of lists. Then purrr::map2_dfc
maps over both this list you've created and the columns of test_target
, and returns a data frame by cbind
ing. The reason this works is that data frames are technically lists of columns, so if you map over a data frame, you're treating each column like a list item. I couldn't get a ifelse
to work right here—something in the logic had only single strings coming back instead of the whole vector.
tibble(all_cols = names(test_target)) %>%
left_join(test_config, by = c("all_cols" = "string_col")) %>%
split(.$all_cols) %>%
map(as.list) %>%
map2_dfc(test_target, function(info, text) {
if (is.na(info$pattern)) {
text
} else {
str_replace(text, info$pattern, info$replacement)
}
})
#> # A tibble: 5 x 5
#> col1 col2 col3 col4 col5
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Foo Foo Foo NULL I
#> 2 bar bar bar NA am
#> 3 "" . . Foo not
#> 4 NA "" NA . changing
#> 5 NULL NULL "" bar .
Created on 2018-10-30 by the reprex package (v0.2.1)
Using pmap on a subset of columns
The columns 'x', 'left' and 'right' are list
columns with each element length > 1, but between
'left' and 'right' argument takes only a vector of length 1. In order to make it, we can either expand the dataset by unnest
ing the list
elements and then apply the pmap
params %>%
mutate(x = pmap(., rbinom)) %>%
select(-n) %>%
mutate(left = map2(x, size, ~
binom.agresti.coull(.x, .y)$lower),
right = map2(x,size, ~
binom.agresti.coull(.x,.y)$upper)) %>%
unnest %>%
mutate(coverage = pmap_lgl(list(x, left, right), between))
Or instead of using between
we can use the comparison operators while keeping the list
structure
params %>%
mutate(x = pmap(., rbinom)) %>%
select(-n) %>%
mutate(left = map2(x, size, ~
binom.agresti.coull(.x, .y)$lower),
right = map2(x,size, ~
binom.agresti.coull(.x,.y)$upper),
coverage = pmap(list(x, left, right), ~ ..1 > ..2 & ..1 < ..3)
Using pmap to apply different regular expressions to different variables in a tibble?
You don't need to create a function (your function is actually the source of the problem): you can use str_replace_all
directly.
pmap_dfr(
list(test_target,
test_config$pattern,
test_config$replacement),
str_replace_all
)
# A tibble: 5 x 4
col1 col2 col3 col4
<chr> <chr> <chr> <chr>
1 Foo Foo Foo NULL
2 bar bar bar NA
3 "" . . Foo
4 NA "" NA .
5 NULL NULL "" bar
Create new dataframe based on sequential row values
Maybe this is what you are looking for:
- Reshape to long format using e.g.
tidy::pivot_longer
- Group by
Name
and make use ofdplyr::lag
to add an indicator whether a person is new - Summarise by year
d <- data.frame(
Name = c("Terry", "Sam", "Nic", "Sarah"),
x2017 = c(1, 0, 0, 0),
x2018 = c(1, 0, 1, 1),
x2019 = c(0, 1, 1, 1)
)
library(dplyr)
library(tidyr)
d %>%
tidyr::pivot_longer(-Name, names_to = "year") %>%
mutate(year = gsub("^x", "", year)) %>%
group_by(Name) %>%
mutate(new = as.numeric(value == 1 & lag(value) == 0),
new = ifelse(is.na(new), value, new)) %>%
ungroup() %>%
group_by(year) %>%
summarise(total = sum(value), new = sum(new))
#> # A tibble: 3 x 3
#> year total new
#> <chr> <dbl> <dbl>
#> 1 2017 1 1
#> 2 2018 3 2
#> 3 2019 3 1
purrr pmap function arguments
The misunderstanding here is that your 3rd and 4th options do not have "named arguments" but default argument values. You are supplying a function definition to the .f
argument of pmap
, not a function call.
pmap
is doing partial argument matching the same way that base R does. It may make this clearer to turn on options(warnPartialMatchArgs = TRUE)
. Here I'll take your 3rd example, factoring the function definition out to make it clearer what is happening:
iter_tibble <- tibble::tibble(
a = 1:2,
b = 3:4,
c = 7:6
)
f3 <- function(a1 = a, b1 = b, c1 = c) {
paste(a1, b1, c1)
}
purrr::pmap(iter_tibble, f3)
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'a' to 'a1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'b' to 'b1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'c' to 'c1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'a' to 'a1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'b' to 'b1'
#> Warning in .f(a = .l[[1L]][[i]], b = .l[[2L]][[i]], c = .l[[3L]][[i]], ...):
#> partial argument match of 'c' to 'c1'
#> [[1]]
#> [1] "1 3 7"
#>
#> [[2]]
#> [1] "2 4 6"
This is exactly the same as the case with regular R functions that you describe, where supplied named arguments can be abbreviations of the function arguments. To put it another way, for the first row of the table, pmap
basically constructs the call f3(a = 1, b = 3, c = 7)
. a
, b
, and c
come from the column names, the values come from the row.
When trying to evaluate this call, we see that the function f3
does not have an argument a
, but it has an argument a1
. So the named argument a = 1
in the call is partially matched to a1
in the function definition. That's what the partial match warnings describe in the output. No "extensions" are happening. The fact that the argument a1
has a default value of a
is totally irrelevant here.
If you want to call a function and pass values in the tibble to differently named arguments, use a wrapper around it to change the interface. You can do this with a separate named function or (as is very common) using the ~
anonymous function syntax. Using your 5th example:
iter_tibble <- tibble::tibble(
a = 1:2,
b = 3:4,
c = 7:6
)
f5 <- function(b1, obelix, c1) {
paste(b1, obelix, c1)
}
f5_wrapper <- function(a, b, c) {
f5(b1 = b, obelix = a, c1 = c)
}
purrr::pmap(iter_tibble, f5_wrapper)
#> [[1]]
#> [1] "3 1 7"
#>
#> [[2]]
#> [1] "4 2 6"
purrr::pmap(iter_tibble, ~ f5(b1 = ..2, obelix = ..1, c1 = ..3))
#> [[1]]
#> [1] "3 1 7"
#>
#> [[2]]
#> [1] "4 2 6"
Related Topics
Draw Lines Between Different Elements in a Stacked Bar Plot
Dplyr::Select() with Some Variables That May Not Exist in the Data Frame
How to Get the Min/Max Possible Numeric
Is There a Limit for the Possible Number of Nested Ifelse Statements
Replacing Negative Values in a Model (System of Odes) with Zero
Update Subset of Values in a Dataframe Column
How to Pass Individual 'Curvature' Arguments in 'Ggplot2' 'Geom_Curve' Function
R: Convert Date from Character to Datetime
Reversing Y-Axis in an Individual Ggplot Facet
Convert String of Anyformat into Dd-Mm-Yy Hh:Mm:Ss in R
Adding a Simple Lm Trend Line to a Ggplot Boxplot
Using R to Do a Regression with Multiple Dependent and Multiple Independent Variables
How to Change Name of Factor Levels
The Rolling Regression in R Using Roll Apply
Getsymbols and Using Lapply, Cl, and Merge to Extract Close Prices
How to Color Entire Background in Ggplot2 When Using Coord_Fixed
Filling in a New Column Based on a Condition in a Data Frame