What Is Meaning of First Tilde in Purrr::Map

What is meaning of first tilde in purrr::map

As per the map help documentation, map needs a function but it also accepts a formula, character vector, numeric vector, or list, the latter of which are converted to functions.

The ~ operator in R creates formula. So ~ lm(mpg ~ wt, data = .) is a formula. Formulas are useful in R because they prevent immediate evaluation of symbols. For example you can define

x <- ~f(a+b)

without f, a or b being defined anywhere. In this case ~ lm(mpg ~ wt, data = .) is basically a shortcut for function(x) {lm(mpg ~ wt, data = x)} because map can change the value of . in the formula as needed.

Without the tilde, lm(mpg ~ wt, data = .) is just an expression or call in R that's evaluated immediately. The . wouldn't be defined at the time that's called and map can't convert that into a function.

You can turn these formulas into functions outside of the map() with purrr::as_mapper() function. For example

myfun <- as_mapper(~lm(mpg ~ wt, data = .))
myfun(mtcars)
# Call:
# lm(formula = mpg ~ wt, data = .)
#
# Coefficients:
# (Intercept) wt
# 37.285 -5.344

myfun
# <lambda>
# function (..., .x = ..1, .y = ..2, . = ..1)
# lm(mpg ~ wt, data = .)
# attr(,"class")
# [1] "rlang_lambda_function"

You can see how the . becomes the first parameter that's passed to that function.

In map(), when is it necessary to use a tilde and a period. (~ and .)

The quick answer to your question is, it is never necessary to use the tilde notation when calling map. There are different ways of calling map and the tilde notation is one of them. You already described the simpelst way of calling map, when a function only takes/needs one argument.

df %>% map_dbl(mean)

However, when functions get more complex there are basically two ways to call them either with the tilde notation or with a normal anonymous function.

# normal anonymous function
models <- mtcars %>%
split(.$cyl) %>%
map(function(x) lm(mpg ~ wt, data = x))

# anonymous mapper function (~)
models <- mtcars %>%
split(.$cyl) %>%
map(~ lm(mpg ~ wt, data = .))

The tilde notation is basically turning a formula into a function, which is most times easier to read. Each option can be turned into a named function, which works as follows. Ideally, the named function reduces the complexity of the underlying function to one argument (the one which should be looped over) and in this case the function can be called like all simple functions in map without further arguments/notations.

# normal named function notation 
lm_mpg_wt <- function(x) {
lm(mpg ~ wt, data = x)
}

models <- mtcars %>%
split(.$cyl) %>%
map(lm_mpg_wt)


# named mapper function
mapper_lm_mpg_wt <- as_mapper(~ lm(mpg ~ wt, data = .))

models <- mtcars %>%
split(.$cyl) %>%
map(mapper_lm_mpg_wt)

Basically these are your options. You should choose whatever is easiest and most fit to your problem. Named functions are best, if you need them again. Many think that mapper functions are easier to read, but at the end of the day that is a choice of personal preference.

Use of Tilde (~) and period (.) in R

This overall is known as tidyverse non-standard evaluation (NSE). You probably found out that ~ also is used in formulas to indicate that the left hand side is dependent on the right hand side.

In tidyverse NSE, ~ indicates function(...). Thus, these two expressions are equivalent.

x %>% detect(function(...) ..1 > 5)
#[1] 6

x %>% detect(~.x > 5)
#[1] 6

~ automatically assigns each argument of the function to the .; .x, .y; and ..1, ..2 ..3 special symbols. Note that only the first argument becomes ..

map2(1, 2, function(x,y) x + y)
#[[1]]
#[1] 3

map2(1, 2, ~.x + .y)
#[[1]]
#[1] 3

map2(1, 2, ~..1 + ..2)
#[[1]]
#[1] 3

map2(1, 2, ~. + ..2)
#[[1]]
#[1] 3

map2(1, 2, ~. + .[2])
#[[1]]
#[1] NA

This automatic assignment can be very helpful when there are many variables.

mtcars %>% pmap_dbl(~ ..1/..4)
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028

But in addition to all of the special symbols I noted above, the arguments are also assigned to .... Just like all of R, ... is sort of like a named list of arguments, so you can use it along with with:

mtcars %>% pmap_dbl(~ with(list(...), mpg/hp))
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028

An other way to think about why this works is because data.frames are just a list with some row names:

a <- list(a = c(1,2), b = c("A","B"))
a
#$a
#[1] 1 2
#$b
#[1] "A" "B"
attr(a,"row.names") <- as.character(c(1,2))
class(a) <- "data.frame"
a
# a b
#1 1 A
#2 2 B

Tilde Operator in map function

Assuming that you are talking about map from the package purrr, this function is designed to map a function over a vector.

length(unique(iris$Sepal.Length)) is a specific value (35 for the standard iris dataset), so

iris_unique <- map(iris, length(unique(iris$Sepal.Length)))

is equivalent to

iris_unique <- map(iris, 35)

since 35 is not a function, this is probably not what you mean. However map() tries to make sense of it. The documentation says that if for the function parameter you pass it a "character vector, numeric vector, or list, it is converted to an extractor function", which means that 35 is converted to the function function(x){x[35]}, hence the net result is to extract the 35th observation of iris.

On the other hand, the documentation also describes how it translates formulas into functions. According to that, the formula ~length(unique(.)) is translated to the function function(x){length(unique(x))}. Since this is a function, it makes perfect sense to map it over a list or vector.

What is tilde in this context of R?

It's a shorthand for an anonymous function that is applied to every group. .x is automatically the input in purrr style anonymous functions (and additionally .y for map2 functions).

But you can use a traditional anonymous functions as well:

mtcars %>%
group_by(cyl) %>%
group_map(., function(x) head(x, 2L)) # the `.` is just for illustration and can be omitted with the %>%

Or you can write a named function and use it in group_map():

new_fun <- function(x) {
head(x, 2L)
}
mtcars %>%
group_by(cyl) %>%
group_map(new_fun)

The function you show (head(.x, 2L)) is applied once to every group in the data. You can check how many groups you have with:

mtcars %>%
group_by(cyl) %>%
n_groups()
#> [1] 3

For each of these groups, the first two rows of the data is printed:

mtcars %>%
group_by(cyl) %>%
group_map(~ head(.x, 2L))
#> [[1]]
#> # A tibble: 2 x 10
#> mpg disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 22.8 108 93 3.85 2.32 18.6 1 1 4 1
#> 2 24.4 147. 62 3.69 3.19 20 1 0 4 2
#>
#> [[2]]
#> # A tibble: 2 x 10
#> mpg disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21 160 110 3.9 2.62 16.5 0 1 4 4
#> 2 21 160 110 3.9 2.88 17.0 0 1 4 4
#>
#> [[3]]
#> # A tibble: 2 x 10
#> mpg disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18.7 360 175 3.15 3.44 17.0 0 0 3 2
#> 2 14.3 360 245 3.21 3.57 15.8 0 0 3 4


Related Topics



Leave a reply



Submit