What does the dplyr period character . reference?
The dot is used within dplyr mainly (not exclusively) in mutate_each
, summarise_each
and do
. In the first two (and their SE counterparts) it refers to all the columns to which the functions in funs
are applied. In do
it refers to the (potentially grouped) data.frame so you can reference single columns by using .$xyz
to reference a column named "xyz".
The reasons you cannot run
filter(df, . == 5)
is because a) filter
is not designed to work with multiple columns like mutate_each
for example and b) you would need to use the pipe operator %>%
(originally from magrittr
).
However, you could use it with a function like rowSums
inside filter
when combined with the pipe operator %>%
:
> filter(mtcars, rowSums(. > 5) > 4)
Error: Objekt '.' not found
> mtcars %>% filter(rowSums(. > 5) > 4) %>% head()
lm cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
5 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
6 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4
You should also take a look at the magrittr help files:
library(magrittr)
help("%>%")
From the help page:
Placing lhs elsewhere in rhs call
Often you will want lhs to the rhs call at another position than the first. For this purpose you can use the dot (.) as placeholder. For example,y %>% f(x, .)
is equivalent tof(x, y)
andz %>% f(x, y, arg = .)
is equivalent tof(x, y, arg = z)
.Using the dot for secondary purposes
Often, some attribute or property of lhs is desired in the rhs call in addition to the value of lhs itself, e.g. the number of rows or columns. It is perfectly valid to use the dot placeholder several times in the rhs call, but by design
the behavior is slightly different when using it inside nested
function calls. In particular, if the placeholder is only used in a
nested function call, lhs will also be placed as the first argument!
The reason for this is that in most use-cases this produces the most
readable code. For example,iris %>% subset(1:nrow(.) %% 2 == 0)
is
equivalent toiris %>% subset(., 1:nrow(.) %% 2 == 0)
but slightly
more compact. It is possible to overrule this behavior by enclosing
the rhs in braces. For example,1:10 %>% {c(min(.), max(.))}
is
equivalent toc(min(1:10), max(1:10))
.
Why do you have to use . when combining dplyr with ggplot?
No, you don't need to use .
, just like this
fulldata %>% ggplot(aes(x=FLYTT))+geom_bar()+coord_flip()
Use of Tilde (~) and period (.) in R
This overall is known as tidyverse
non-standard evaluation (NSE). You probably found out that ~
also is used in formulas to indicate that the left hand side is dependent on the right hand side.
In tidyverse
NSE, ~
indicates function(...)
. Thus, these two expressions are equivalent.
x %>% detect(function(...) ..1 > 5)
#[1] 6
x %>% detect(~.x > 5)
#[1] 6
~
automatically assigns each argument of the function to the .
; .x
, .y
; and ..1
, ..2
..3
special symbols. Note that only the first argument becomes .
.
map2(1, 2, function(x,y) x + y)
#[[1]]
#[1] 3
map2(1, 2, ~.x + .y)
#[[1]]
#[1] 3
map2(1, 2, ~..1 + ..2)
#[[1]]
#[1] 3
map2(1, 2, ~. + ..2)
#[[1]]
#[1] 3
map2(1, 2, ~. + .[2])
#[[1]]
#[1] NA
This automatic assignment can be very helpful when there are many variables.
mtcars %>% pmap_dbl(~ ..1/..4)
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028
But in addition to all of the special symbols I noted above, the arguments are also assigned to ...
. Just like all of R, ...
is sort of like a named list of arguments, so you can use it along with with
:
mtcars %>% pmap_dbl(~ with(list(...), mpg/hp))
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028
An other way to think about why this works is because data.frame
s are just a list
with some row names:
a <- list(a = c(1,2), b = c("A","B"))
a
#$a
#[1] 1 2
#$b
#[1] "A" "B"
attr(a,"row.names") <- as.character(c(1,2))
class(a) <- "data.frame"
a
# a b
#1 1 A
#2 2 B
What does the magrittr dot/period (.) operator do when it's at the very beginning of a pipeline?
The confusion here can actually come from two places.
First, yes, the . %>% something()
syntax creates a "unary" function that takes one argument. So:
. %>% filter(Species == 'setosa')
is equivalent to
function(.) filter(., Species == 'setosa')
The second part here is that ggplot2
layers can actually take a function as their data
argument. From e.g. ?geom_point
:
The data to be displayed in this layer. There are three options:
...
A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data.
So the function that is passed to geom_point
will always be applied to the default plot data (i.e. the data defined in ggplot()
).
Note that your linked question concerns the use of .
in funs()
, which is not directly related to it's use here.
How to refer to an argument as character in dplyr filter inside a function
Use rlang::ensym()
to capture x
as a symbol, which you can then convert using as.character()
:
library(tidyverse)
per.gender <- function(x) {
new_name <- codebook_e1 %>%
filter(Variable == as.character(ensym(x))) %>%
select(Label) %>%
pull()
e1_done %>%
group_by(koen_new) %>%
mutate(total_n_gender = n()) %>%
group_by(koen_new,{{x}}) %>%
mutate(n_frvl = n()) %>%
select(n_frvl, total_n_gender) %>%
mutate(procentandel = n_frvl/total_n_gender) %>%
distinct(koen_new, {{x}}, procentandel,.keep_all = TRUE) %>%
filter({{x}} == 1) %>%
ungroup() %>%
select(koen_new, !!new_name := procentandel)
}
per.gender(frvlg_1)
Result:
# A tibble: 2 x 2
koen_new `Frvlg: Kultur (Fx Museer, Lokalhistoriske Arkiver, Sangkor, Teater)`
<chr> <dbl>
1 Kvinde 0.0417
2 Mand 0.115
Also note use of !!
and :=
operators to use the value referred to by new_name
in the final select()
statement — otherwise the column would just be named "new_name".
R combinations with dot (.), ~, and pipe (%%) operator
That line uses the .
in three different ways.
[1] [2] [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))
Generally speaking you pass in the value from the pipe into your function at a specific location with .
but there are some exceptions. One exception is when the .
is in a formula. The ~
is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example
aggregate(. ~ cyl, data=mydata)
And that's just because aggregate
requires a formula with both a left and right hand side. So the .
at [1]
just means "all the other columns in the dataset." This use is not at all related to magrittr.
The .
at [2]
is the value that's being passed in as the pipe. If you have a plain .
as a parameter to the function, that's there the value will be placed. So the result of the subset()
will go to the data=
parameter.
The magrittr
library also allows you to define anonymous functions with the .
variable. If you have a chain that starts with a .
, it's treated like a function. so
. %>% mean %>% round(2)
is the same as
function(x) round(mean(x), 2)
so you're just creating a custom function with the .
at [3]
What is the dot (.) notation in R?
The .
is the notation for the data passed through %>%
.
For example, you can reference specific columns of the data with .$your_column
Take a look at the documentation for pipe
How does this anonymous function syntax work?
.x
refers to the dataframe in each list.
tmp <- mtcars %>% split(.$cyl)
So for 1st iteration .x
would be tmp[[1]]
, for second tmp[[2]]
and so on. Note that instead of .x
you can also use .
here which would return the same output.
See documentation in ?map
:
There are three ways to refer to the arguments:
For a single argument function, use .
For a two argument function, use .x and .y
For more arguments, use ..1, ..2, ..3 etc
Loop over averaging period within dplyr statement
A quick thing you can do is use purrr
to apply the function to each value from 10 to 35:
library(tidyverse)
library(zoo)
data <- tibble(a=seq(1:1000), b=runif(1000), c=rep(c('x','y','Z','q'), 250))
10:35 %>%
map_df(~{
data %>%
group_by(c) %>%
mutate(mean = rollmean(a, .x, na.pad=TRUE, align='left')) %>%
ungroup() %>%
drop_na() %>%
group_by(c) %>%
dplyr::summarize(cor = cor(mean,b)) %>%
mutate(ndays = .x)
})
#> # A tibble: 104 x 3
#> c cor ndays
#> <chr> <dbl> <int>
#> 1 q 0.0519 10
#> 2 x -0.123 10
#> 3 y 0.0347 10
#> 4 Z -0.116 10
#> 5 q 0.0571 11
#> 6 x -0.111 11
#> 7 y 0.0379 11
#> 8 Z -0.124 11
#> 9 q 0.0498 12
#> 10 x -0.103 12
#> # … with 94 more rows
Created on 2020-04-02 by the reprex package (v0.3.0)
Related Topics
Plotting Contours on an Irregular Grid
Explain a Lazy Evaluation Quirk
How to Export Multiple Data.Frame to Multiple Excel Worksheets
Simpler Population Pyramid in Ggplot2
Split Violin Plot With Ggplot2
How R Formats Posixct With Fractional Seconds
Subset Dataframe by Multiple Logical Conditions of Rows to Remove
How to Put Labels Over Geom_Bar in R With Ggplot2
Special Variables in Ggplot (..Count.., ..Density.., etc.)
How to Flatten/Merge Overlapping Time Periods
A Comprehensive Survey of the Types of Things in R; 'Mode' and 'Class' and 'Typeof' Are Insufficient
Change Variable Name in For Loop Using R
Merge Two Data Frames While Keeping the Original Row Order
Plot With Conditional Colors Based on Values in R