What Does the Dplyr Period Character "." Reference

What does the dplyr period character . reference?

The dot is used within dplyr mainly (not exclusively) in mutate_each, summarise_each and do. In the first two (and their SE counterparts) it refers to all the columns to which the functions in funs are applied. In do it refers to the (potentially grouped) data.frame so you can reference single columns by using .$xyz to reference a column named "xyz".

The reasons you cannot run

filter(df, . == 5)

is because a) filter is not designed to work with multiple columns like mutate_each for example and b) you would need to use the pipe operator %>% (originally from magrittr).

However, you could use it with a function like rowSums inside filter when combined with the pipe operator %>%:

> filter(mtcars, rowSums(. > 5) > 4)
Error: Objekt '.' not found

> mtcars %>% filter(rowSums(. > 5) > 4) %>% head()
lm cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
5 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
6 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4

You should also take a look at the magrittr help files:

library(magrittr)
help("%>%")

From the help page:

Placing lhs elsewhere in rhs call
Often you will want lhs to the rhs call at another position than the first. For this purpose you can use the dot (.) as placeholder. For example, y %>% f(x, .) is equivalent to f(x, y) and z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z).

Using the dot for secondary purposes
Often, some attribute or property of lhs is desired in the rhs call in addition to the value of lhs itself, e.g. the number of rows or columns. It is perfectly valid to use the dot placeholder several times in the rhs call, but by design
the behavior is slightly different when using it inside nested
function calls. In particular, if the placeholder is only used in a
nested function call, lhs will also be placed as the first argument!
The reason for this is that in most use-cases this produces the most
readable code. For example, iris %>% subset(1:nrow(.) %% 2 == 0) is
equivalent to iris %>% subset(., 1:nrow(.) %% 2 == 0) but slightly
more compact. It is possible to overrule this behavior by enclosing
the rhs in braces. For example, 1:10 %>% {c(min(.), max(.))} is
equivalent to c(min(1:10), max(1:10)).

Why do you have to use . when combining dplyr with ggplot?

No, you don't need to use ., just like this

fulldata %>% ggplot(aes(x=FLYTT))+geom_bar()+coord_flip()

Use of Tilde (~) and period (.) in R

This overall is known as tidyverse non-standard evaluation (NSE). You probably found out that ~ also is used in formulas to indicate that the left hand side is dependent on the right hand side.

In tidyverse NSE, ~ indicates function(...). Thus, these two expressions are equivalent.

x %>% detect(function(...) ..1 > 5)
#[1] 6

x %>% detect(~.x > 5)
#[1] 6

~ automatically assigns each argument of the function to the .; .x, .y; and ..1, ..2 ..3 special symbols. Note that only the first argument becomes ..

map2(1, 2, function(x,y) x + y)
#[[1]]
#[1] 3

map2(1, 2, ~.x + .y)
#[[1]]
#[1] 3

map2(1, 2, ~..1 + ..2)
#[[1]]
#[1] 3

map2(1, 2, ~. + ..2)
#[[1]]
#[1] 3

map2(1, 2, ~. + .[2])
#[[1]]
#[1] NA

This automatic assignment can be very helpful when there are many variables.

mtcars %>% pmap_dbl(~ ..1/..4)
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028

But in addition to all of the special symbols I noted above, the arguments are also assigned to .... Just like all of R, ... is sort of like a named list of arguments, so you can use it along with with:

mtcars %>% pmap_dbl(~ with(list(...), mpg/hp))
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028

An other way to think about why this works is because data.frames are just a list with some row names:

a <- list(a = c(1,2), b = c("A","B"))
a
#$a
#[1] 1 2
#$b
#[1] "A" "B"
attr(a,"row.names") <- as.character(c(1,2))
class(a) <- "data.frame"
a
# a b
#1 1 A
#2 2 B

What does the magrittr dot/period (.) operator do when it's at the very beginning of a pipeline?

The confusion here can actually come from two places.

First, yes, the . %>% something() syntax creates a "unary" function that takes one argument. So:

. %>% filter(Species == 'setosa')

is equivalent to

function(.) filter(., Species == 'setosa')

The second part here is that ggplot2 layers can actually take a function as their data argument. From e.g. ?geom_point:

The data to be displayed in this layer. There are three options:

...

A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data.

So the function that is passed to geom_point will always be applied to the default plot data (i.e. the data defined in ggplot()).

Note that your linked question concerns the use of . in funs(), which is not directly related to it's use here.

How to refer to an argument as character in dplyr filter inside a function

Use rlang::ensym() to capture x as a symbol, which you can then convert using as.character():

library(tidyverse)

per.gender <- function(x) {
new_name <- codebook_e1 %>%
filter(Variable == as.character(ensym(x))) %>%
select(Label) %>%
pull()

e1_done %>%
group_by(koen_new) %>%
mutate(total_n_gender = n()) %>%
group_by(koen_new,{{x}}) %>%
mutate(n_frvl = n()) %>%
select(n_frvl, total_n_gender) %>%
mutate(procentandel = n_frvl/total_n_gender) %>%
distinct(koen_new, {{x}}, procentandel,.keep_all = TRUE) %>%
filter({{x}} == 1) %>%
ungroup() %>%
select(koen_new, !!new_name := procentandel)
}

per.gender(frvlg_1)

Result:

# A tibble: 2 x 2
koen_new `Frvlg: Kultur (Fx Museer, Lokalhistoriske Arkiver, Sangkor, Teater)`
<chr> <dbl>
1 Kvinde 0.0417
2 Mand 0.115

Also note use of !! and := operators to use the value referred to by new_name in the final select() statement — otherwise the column would just be named "new_name".

R combinations with dot (.), ~, and pipe (%%) operator

That line uses the . in three different ways.

         [1]             [2]      [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))

Generally speaking you pass in the value from the pipe into your function at a specific location with . but there are some exceptions. One exception is when the . is in a formula. The ~ is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example

aggregate(. ~ cyl, data=mydata)

And that's just because aggregate requires a formula with both a left and right hand side. So the . at [1] just means "all the other columns in the dataset." This use is not at all related to magrittr.

The . at [2] is the value that's being passed in as the pipe. If you have a plain . as a parameter to the function, that's there the value will be placed. So the result of the subset() will go to the data= parameter.

The magrittr library also allows you to define anonymous functions with the . variable. If you have a chain that starts with a ., it's treated like a function. so

. %>% mean %>% round(2)

is the same as

function(x) round(mean(x), 2)

so you're just creating a custom function with the . at [3]

What is the dot (.) notation in R?

The . is the notation for the data passed through %>%.
For example, you can reference specific columns of the data with .$your_column

Take a look at the documentation for pipe

How does this anonymous function syntax work?

.x refers to the dataframe in each list.

tmp <- mtcars %>% split(.$cyl)

So for 1st iteration .x would be tmp[[1]], for second tmp[[2]] and so on. Note that instead of .x you can also use . here which would return the same output.

See documentation in ?map :

There are three ways to refer to the arguments:

For a single argument function, use .

For a two argument function, use .x and .y

For more arguments, use ..1, ..2, ..3 etc

Loop over averaging period within dplyr statement

A quick thing you can do is use purrr to apply the function to each value from 10 to 35:

library(tidyverse)
library(zoo)

data <- tibble(a=seq(1:1000), b=runif(1000), c=rep(c('x','y','Z','q'), 250))

10:35 %>%
map_df(~{
data %>%
group_by(c) %>%
mutate(mean = rollmean(a, .x, na.pad=TRUE, align='left')) %>%
ungroup() %>%
drop_na() %>%
group_by(c) %>%
dplyr::summarize(cor = cor(mean,b)) %>%
mutate(ndays = .x)
})
#> # A tibble: 104 x 3
#> c cor ndays
#> <chr> <dbl> <int>
#> 1 q 0.0519 10
#> 2 x -0.123 10
#> 3 y 0.0347 10
#> 4 Z -0.116 10
#> 5 q 0.0571 11
#> 6 x -0.111 11
#> 7 y 0.0379 11
#> 8 Z -0.124 11
#> 9 q 0.0498 12
#> 10 x -0.103 12
#> # … with 94 more rows

Created on 2020-04-02 by the reprex package (v0.3.0)



Related Topics



Leave a reply



Submit