Usage of Dot/Period in R Functions

Usage of Dot / Period in R Functions

Dot is a a valid character in symbol names just like any letter, so . is no different than let's say a - it has no special meaning in this context. You can write things like:

> . <- 10
> . + .
[1] 20

It may look strange but is valid in R. The above use function(.) is let's say unusual, but syntactically valid. Since the author did not reference . in the function body, we will never know if he meant ... or just used it because he could.

What does the magrittr dot/period (.) operator do when it's at the very beginning of a pipeline?

The confusion here can actually come from two places.

First, yes, the . %>% something() syntax creates a "unary" function that takes one argument. So:

. %>% filter(Species == 'setosa')

is equivalent to

function(.) filter(., Species == 'setosa')

The second part here is that ggplot2 layers can actually take a function as their data argument. From e.g. ?geom_point:

The data to be displayed in this layer. There are three options:

...

A function will be called with a single argument, the plot data. The return value must be a data.frame, and will be used as the layer data.

So the function that is passed to geom_point will always be applied to the default plot data (i.e. the data defined in ggplot()).

Note that your linked question concerns the use of . in funs(), which is not directly related to it's use here.

What does the dplyr period character . reference?

The dot is used within dplyr mainly (not exclusively) in mutate_each, summarise_each and do. In the first two (and their SE counterparts) it refers to all the columns to which the functions in funs are applied. In do it refers to the (potentially grouped) data.frame so you can reference single columns by using .$xyz to reference a column named "xyz".

The reasons you cannot run

filter(df, . == 5)

is because a) filter is not designed to work with multiple columns like mutate_each for example and b) you would need to use the pipe operator %>% (originally from magrittr).

However, you could use it with a function like rowSums inside filter when combined with the pipe operator %>%:

> filter(mtcars, rowSums(. > 5) > 4)
Error: Objekt '.' not found

> mtcars %>% filter(rowSums(. > 5) > 4) %>% head()
lm cyl disp hp drat wt qsec vs am gear carb
1 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
2 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
3 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
4 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
5 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
6 14.3 8 360 245 3.21 3.570 15.84 0 0 3 4

You should also take a look at the magrittr help files:

library(magrittr)
help("%>%")

From the help page:

Placing lhs elsewhere in rhs call
Often you will want lhs to the rhs call at another position than the first. For this purpose you can use the dot (.) as placeholder. For example, y %>% f(x, .) is equivalent to f(x, y) and z %>% f(x, y, arg = .) is equivalent to f(x, y, arg = z).

Using the dot for secondary purposes
Often, some attribute or property of lhs is desired in the rhs call in addition to the value of lhs itself, e.g. the number of rows or columns. It is perfectly valid to use the dot placeholder several times in the rhs call, but by design
the behavior is slightly different when using it inside nested
function calls. In particular, if the placeholder is only used in a
nested function call, lhs will also be placed as the first argument!
The reason for this is that in most use-cases this produces the most
readable code. For example, iris %>% subset(1:nrow(.) %% 2 == 0) is
equivalent to iris %>% subset(., 1:nrow(.) %% 2 == 0) but slightly
more compact. It is possible to overrule this behavior by enclosing
the rhs in braces. For example, 1:10 %>% {c(min(.), max(.))} is
equivalent to c(min(1:10), max(1:10)).

Usage of `...` (three-dots or dot-dot-dot) in functions

The word used to describe ... is "ellipsis." Knowing this should make searching for information about the construct easier. For example, the first hit on Google is another question on this site: How to use R's ellipsis feature when writing your own function?

What does the dot mean in R – personal preference, naming convention or more?

A dot in function name can mean any of the following:

  • nothing at all
  • a separator between method and class in S3 methods
  • to hide the function name

Possible meanings

1. Nothing at all

The dot in data.frame doesn't separate data from frame, other than visually.

2. Separation of methods and classes in S3 methods

plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))

3. To hide internal functions

When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.

In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:

x <- 3
.x <- 4

ls()
[1] "x"

ls(all.names=TRUE)
[1] ".x" "x"

x
[1] 3
.x
[1] 4

4. Other possible reasons

In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.


Complications

This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.

For example, to convert a data.frame to a list you use as.list(..)

as.list(iris)

In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:

> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>

And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:

> library(data.table)

> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*

Non-visible functions are asterisked

> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>

Use of Tilde (~) and period (.) in R

This overall is known as tidyverse non-standard evaluation (NSE). You probably found out that ~ also is used in formulas to indicate that the left hand side is dependent on the right hand side.

In tidyverse NSE, ~ indicates function(...). Thus, these two expressions are equivalent.

x %>% detect(function(...) ..1 > 5)
#[1] 6

x %>% detect(~.x > 5)
#[1] 6

~ automatically assigns each argument of the function to the .; .x, .y; and ..1, ..2 ..3 special symbols. Note that only the first argument becomes ..

map2(1, 2, function(x,y) x + y)
#[[1]]
#[1] 3

map2(1, 2, ~.x + .y)
#[[1]]
#[1] 3

map2(1, 2, ~..1 + ..2)
#[[1]]
#[1] 3

map2(1, 2, ~. + ..2)
#[[1]]
#[1] 3

map2(1, 2, ~. + .[2])
#[[1]]
#[1] NA

This automatic assignment can be very helpful when there are many variables.

mtcars %>% pmap_dbl(~ ..1/..4)
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028

But in addition to all of the special symbols I noted above, the arguments are also assigned to .... Just like all of R, ... is sort of like a named list of arguments, so you can use it along with with:

mtcars %>% pmap_dbl(~ with(list(...), mpg/hp))
# [1] 0.19090909 0.19090909 0.24516129 0.19454545 0.10685714 0.17238095 0.05836735 0.39354839 0.24000000 0.15609756
#[11] 0.14471545 0.09111111 0.09611111 0.08444444 0.05073171 0.04837209 0.06391304 0.49090909 0.58461538 0.52153846
#[21] 0.22164948 0.10333333 0.10133333 0.05428571 0.10971429 0.41363636 0.28571429 0.26902655 0.05984848 0.11257143
#[31] 0.04477612 0.19633028

An other way to think about why this works is because data.frames are just a list with some row names:

a <- list(a = c(1,2), b = c("A","B"))
a
#$a
#[1] 1 2
#$b
#[1] "A" "B"
attr(a,"row.names") <- as.character(c(1,2))
class(a) <- "data.frame"
a
# a b
#1 1 A
#2 2 B

Call an R function with run-time generated ellipsis arguments (dot-dot-dot / three dots)

You can do this by passing the function arguments using do.call. First force to list using as.list.

eg

input <- c(a = 1, b = 2)
do.call(f, as.list(input))

input <- list(a = 1, b = 2)
do.call(f, as.list(input))

What do . (dot) and % (percentage) mean in R?

. has no inherent/magical meaning in R. It's just another character that you can use in symbol names. But because it is so convenient to type, it has been given special meaning by certain functions and conventions in R. Here are just a few

  • . is used look up S3 generic method implementations. For example, if you call a generic function like plot with an object of class lm as the first parameter, then it will look for a function named plot.lm and, if found, call that.
  • often . in formulas means "all other variables", for example lm(y~., data=dd) will regress y on all the other variables in the data.frame dd.
  • libraries like dplyr use it as a special variable name to indicate the current data.frame for methods like do(). They could just as easily have chosen to use the variable name X instead
  • functions like bquote use .() as a special function to escape variables in expressions
  • variables that start with a period are considered "hidden" and will not show up with ls() unless you call ls(all.names=TRUE) (similar to the UNIX file system behavior)

However, you can also just define a variable named my.awesome.variable<-42 and it will work just like any other variable.

A % by itself doesn't mean anything special, but R allows you to define your own infix operators in the form %<something>% using two percent signs. If you define

`%myfun%` <- function(a,b) {
a*3-b*2
}

you can call it like

5 %myfun% 2
# [1] 11

What do the dot (.) and the tilde (~) represent in R?

This is the "Formula" syntax
~ is read as "in function of"
and . means all other variables
in this case you have
load_result in function of every other variable except annual_income

Using the %% pipe, and dot (.) notation

The problem isn't map, but rather how the %>% pipe deals with the .. Consider the following examples (remember that / is a two argument function in R):

Simple piping:

1 %>% `/`(2)

Is equivalent to `/`(1, 2) or 1 / 2 and gives 0.5.

It is also equivalent to 1 %>% `/`(., 2).

Simple . use:

1 %>% `/`(2, .)

Is equivalent to `/`(2, 1) or 2 / 1 and gives 2.

You can see that 1 is no longer used as the first argument, but only as the second.

Other . use:

This doesn't work however, when subsetting the .:

list(a = 1) %>% `/`(.$a, 2)
Error in `/`(., .$a, 2) : operator needs one or two arguments

We can see that . got injected twice, as the first argument and subsetted in the second argument. An expression like .$a is sometimes referred to as a nested function call (the $ function is used inside the / function, in this case).

We use braces to avoid first argument injection:

list(a = 1) %>% { `/`(.$a, 2) }

Gives 0.5 again.

Actual problem:

You are actually calling map(df, df$data, min), not map(df$data, min).

Solution:

Use braces:

df %>% { map(.$data, min) }

Also see the header Using the dot for secondary purposes in ?magrittr::`%>%` which reads:

In particular, if the placeholder is only used in a nested function
call, lhs will also be placed as the first argument! The reason for
this is that in most use-cases this produces the most readable code.
For example, iris %>% subset(1:nrow(.) %% 2 == 0) is equivalent to
iris %>% subset(., 1:nrow(.) %% 2 == 0) but slightly more compact. It
is possible to overrule this behavior by enclosing the rhs in braces.
For example, 1:10 %>% {c(min(.), max(.))} is equivalent to
c(min(1:10), max(1:10)).



Related Topics



Leave a reply



Submit