Tilde Dot in R (~.)
As MrFlick pointed out, these are two separate operators. Together, they provide a special mechanism that allows tidyverse packages to construct lambda functions on the fly. This is best described in ?purrr::as_mapper
. Specifically,
If a formula, e.g. ~ .x + 2, it is converted to a function. There are three ways to refer to the arguments:
For a single argument function, use .
For a two argument function, use .x and .y
For more arguments, use ..1, ..2, ..3 etc
Using your example:
purrr::as_mapper( ~. > 5 )
# <lambda>
# function (..., .x = ..1, .y = ..2, . = ..1)
# . > 5
# attr(,"class")
# [1] "rlang_lambda_function"
creates a function that returns a logical value indicating whether the argument of the function is greater than 5. purrr::detect()
generates this function internally and then uses it to traverse the input vector x
. The final result is the first element of x
that satisfies the "greater than 5" constraint.
As pointed out by Konrad, this mechanism is specific to tidyverse and does not work in general. Outside of tidyverse, the behavior of this syntax is explained in a related question.
Meaning of ~. (tilde dot) argument?
This is a formula, in a shorthand notation. Try this:
plot( mpg ~ cyl, data= mtcars )
The left hand is the dependent variable, the right hand is the independent variable. Much like y = bx + c means that y ~ x.
Formulas are one of the corner stones of R, and you will need to understand them to use R efficiently. Most frequently, formulas are used in modeling of all sorts, for example you can do basic linear regression with
lm( mpg ~ wt, data= mtcars )
...to see how mileage per gallon depend on weight. Take a look at ?formula
for some more explanations.
The dot means "any columns from data that are otherwise not used". Google for "R formulas" to get more information.
Use of Tilde (~) and period (.) in R
This overall is known as tidyverse
non-standard evaluation (NSE). You probably found out that ~
also is used in formulas to indicate that the left hand side is dependent on the right hand side.
In tidyverse
NSE, ~
indicates function(...)
. Thus, these two expressions are equivalent.
x %>% detect(function(...) ..1 > 5)
#[1] 6
x %>% detect(~.x > 5)
#[1] 6
~
automatically assigns each argument of the function to the .
; .x
, .y
; and ..1
, ..2
..3
special symbols. Note that only the first argument becomes .
.
map2(1, 2, function(x,y) x + y)
#[[1]]
#[1] 3
map2(1, 2, ~.x + .y)
#[[1]]
#[1] 3
map2(1, 2, ~..1 + ..2)
#[[1]]
#[1] 3
map2(1, 2, ~. + ..2)
#[[1]]
#[1] 3
map2(1, 2, ~. + .[2])
#[[1]]
#[1] NA
This automatic assignment can be very helpful when there are many variables.
mtcars %>% pmap_dbl(~ ..1/..4)
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028
But in addition to all of the special symbols I noted above, the arguments are also assigned to ...
. Just like all of R, ...
is sort of like a named list of arguments, so you can use it along with with
:
mtcars %>% pmap_dbl(~ with(list(...), mpg/hp))
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028
An other way to think about why this works is because data.frame
s are just a list
with some row names:
a <- list(a = c(1,2), b = c("A","B"))
a
#$a
#[1] 1 2
#$b
#[1] "A" "B"
attr(a,"row.names") <- as.character(c(1,2))
class(a) <- "data.frame"
a
# a b
#1 1 A
#2 2 B
What is meaning of first tilde in purrr::map
As per the map help documentation, map
needs a function but it also accepts a formula, character vector, numeric vector, or list, the latter of which are converted to functions.
The ~
operator in R creates formula. So ~ lm(mpg ~ wt, data = .)
is a formula. Formulas are useful in R because they prevent immediate evaluation of symbols. For example you can define
x <- ~f(a+b)
without f
, a
or b
being defined anywhere. In this case ~ lm(mpg ~ wt, data = .)
is basically a shortcut for function(x) {lm(mpg ~ wt, data = x)}
because map
can change the value of .
in the formula as needed.
Without the tilde, lm(mpg ~ wt, data = .)
is just an expression or call in R that's evaluated immediately. The .
wouldn't be defined at the time that's called and map
can't convert that into a function.
You can turn these formulas into functions outside of the map()
with purrr::as_mapper()
function. For example
myfun <- as_mapper(~lm(mpg ~ wt, data = .))
myfun(mtcars)
# Call:
# lm(formula = mpg ~ wt, data = .)
#
# Coefficients:
# (Intercept) wt
# 37.285 -5.344
myfun
# <lambda>
# function (..., .x = ..1, .y = ..2, . = ..1)
# lm(mpg ~ wt, data = .)
# attr(,"class")
# [1] "rlang_lambda_function"
You can see how the .
becomes the first parameter that's passed to that function.
What does the dplyr period character . reference?
The dot is used within dplyr mainly (not exclusively) in mutate_each
, summarise_each
and do
. In the first two (and their SE counterparts) it refers to all the columns to which the functions in funs
are applied. In do
it refers to the (potentially grouped) data.frame so you can reference single columns by using .$xyz
to reference a column named "xyz".
The reasons you cannot run
filter(df, . == 5)
is because a) filter
is not designed to work with multiple columns like mutate_each
for example and b) you would need to use the pipe operator %>%
(originally from magrittr
).
However, you could use it with a function like rowSums
inside filter
when combined with the pipe operator %>%
:
> filter(mtcars, rowSums(. > 5) > 4)
Error: Objekt '.' not found
> mtcars %>% filter(rowSums(. > 5) > 4) %>% head()
lm cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
5 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
6 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
You should also take a look at the magrittr help files:
library(magrittr)
help("%>%")
From the help page:
Placing lhs elsewhere in rhs call
Often you will want lhs to the rhs call at another position than the first. For this purpose you can use the dot (.) as placeholder. For example,y %>% f(x, .)
is equivalent tof(x, y)
andz %>% f(x, y, arg = .)
is equivalent tof(x, y, arg = z)
.Using the dot for secondary purposes
Often, some attribute or property of lhs is desired in the rhs call in addition to the value of lhs itself, e.g. the number of rows or columns. It is perfectly valid to use the dot placeholder several times in the rhs call, but by design
the behavior is slightly different when using it inside nested
function calls. In particular, if the placeholder is only used in a
nested function call, lhs will also be placed as the first argument!
The reason for this is that in most use-cases this produces the most
readable code. For example,iris %>% subset(1:nrow(.) %% 2 == 0)
is
equivalent toiris %>% subset(., 1:nrow(.) %% 2 == 0)
but slightly
more compact. It is possible to overrule this behavior by enclosing
the rhs in braces. For example,1:10 %>% {c(min(.), max(.))}
is
equivalent toc(min(1:10), max(1:10))
.
Use of ~ (tilde) in R programming Language
The thing on the right of <-
is a formula
object. It is often used to denote a statistical model, where the thing on the left of the ~
is the response and the things on the right of the ~
are the explanatory variables. So in English you'd say something like "Species depends on Sepal Length, Sepal Width, Petal Length and Petal Width".
The myFormula <-
part of that line stores the formula in an object called myFormula
so you can use it in other parts of your R code.
Other common uses of formula objects in R
The lattice
package uses them to specify the variables to plot.
The ggplot2
package uses them to specify panels for plotting.
The dplyr
package uses them for non-standard evaulation.
In map(), when is it necessary to use a tilde and a period. (~ and .)
The quick answer to your question is, it is never necessary to use the tilde notation when calling map. There are different ways of calling map and the tilde notation is one of them. You already described the simpelst way of calling map, when a function only takes/needs one argument.
df %>% map_dbl(mean)
However, when functions get more complex there are basically two ways to call them either with the tilde notation or with a normal anonymous function.
# normal anonymous function
models <- mtcars %>%
split(.$cyl) %>%
map(function(x) lm(mpg ~ wt, data = x))
# anonymous mapper function (~)
models <- mtcars %>%
split(.$cyl) %>%
map(~ lm(mpg ~ wt, data = .))
The tilde notation is basically turning a formula into a function, which is most times easier to read. Each option can be turned into a named function, which works as follows. Ideally, the named function reduces the complexity of the underlying function to one argument (the one which should be looped over) and in this case the function can be called like all simple functions in map without further arguments/notations.
# normal named function notation
lm_mpg_wt <- function(x) {
lm(mpg ~ wt, data = x)
}
models <- mtcars %>%
split(.$cyl) %>%
map(lm_mpg_wt)
# named mapper function
mapper_lm_mpg_wt <- as_mapper(~ lm(mpg ~ wt, data = .))
models <- mtcars %>%
split(.$cyl) %>%
map(mapper_lm_mpg_wt)
Basically these are your options. You should choose whatever is easiest and most fit to your problem. Named functions are best, if you need them again. Many think that mapper functions are easier to read, but at the end of the day that is a choice of personal preference.
R combinations with dot (.), ~, and pipe (%%) operator
That line uses the .
in three different ways.
[1] [2] [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))
Generally speaking you pass in the value from the pipe into your function at a specific location with .
but there are some exceptions. One exception is when the .
is in a formula. The ~
is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example
aggregate(. ~ cyl, data=mydata)
And that's just because aggregate
requires a formula with both a left and right hand side. So the .
at [1]
just means "all the other columns in the dataset." This use is not at all related to magrittr.
The .
at [2]
is the value that's being passed in as the pipe. If you have a plain .
as a parameter to the function, that's there the value will be placed. So the result of the subset()
will go to the data=
parameter.
The magrittr
library also allows you to define anonymous functions with the .
variable. If you have a chain that starts with a .
, it's treated like a function. so
. %>% mean %>% round(2)
is the same as
function(x) round(mean(x), 2)
so you're just creating a custom function with the .
at [3]
Related Topics
Rolling Join Grouped by a Second Variable in Data.Table
How to Manage a Table/Matrix to Obtain Information Using Conditions
Rstudio Calls Source() When Saving Script
Error in Bind_Rows_(X, .Id):Argument 1 Must Have Names
How to Rbind Only the Common Columns of Two Data Sets
Why Are the Colors Wrong on This Ggplot
Repeat Vector to Fill Down Column in Data Frame
Back-To-Back Barplot with Independent Axes R
How to Calculate a Table of Pairwise Counts from Long-Form Data Frame
Merge Plm Fitted Values to Dataset
Ggplot2: Creating Themed Title, Subtitle with Cowplot
Date-Time Differences Between Rows in R
How to Color Entire Background in Ggplot2 When Using Coord_Fixed
Reshaping Data to Plot in R Using Ggplot2
Large Integers in Data.Table. Grouping Results Different in 1.9.2 Compared to 1.8.10
Higher Level Functions in R - Is There an Official Compose Operator or Curry Function