What Do . (Dot) and % (Percentage) Mean in R

What do . (dot) and % (percentage) mean in R?

. has no inherent/magical meaning in R. It's just another character that you can use in symbol names. But because it is so convenient to type, it has been given special meaning by certain functions and conventions in R. Here are just a few

  • . is used look up S3 generic method implementations. For example, if you call a generic function like plot with an object of class lm as the first parameter, then it will look for a function named plot.lm and, if found, call that.
  • often . in formulas means "all other variables", for example lm(y~., data=dd) will regress y on all the other variables in the data.frame dd.
  • libraries like dplyr use it as a special variable name to indicate the current data.frame for methods like do(). They could just as easily have chosen to use the variable name X instead
  • functions like bquote use .() as a special function to escape variables in expressions
  • variables that start with a period are considered "hidden" and will not show up with ls() unless you call ls(all.names=TRUE) (similar to the UNIX file system behavior)

However, you can also just define a variable named my.awesome.variable<-42 and it will work just like any other variable.

A % by itself doesn't mean anything special, but R allows you to define your own infix operators in the form %<something>% using two percent signs. If you define

`%myfun%` <- function(a,b) {
a*3-b*2
}

you can call it like

5 %myfun% 2
# [1] 11

What does the dot mean in R – personal preference, naming convention or more?

A dot in function name can mean any of the following:

  • nothing at all
  • a separator between method and class in S3 methods
  • to hide the function name

Possible meanings

1. Nothing at all

The dot in data.frame doesn't separate data from frame, other than visually.

2. Separation of methods and classes in S3 methods

plot is one example of a generic S3 method. Thus plot.lm and plot.glm are the underlying function definitions that are used when calling plot(lm(...)) or plot(glm(...))

3. To hide internal functions

When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.

In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls(). To force ls to show these variables, use ls(all.names=TRUE). By using a dot as first letter of a variable, you change the scope of the variable itself. For example:

x <- 3
.x <- 4

ls()
[1] "x"

ls(all.names=TRUE)
[1] ".x" "x"

x
[1] 3
.x
[1] 4

4. Other possible reasons

In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.


Complications

This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.

For example, to convert a data.frame to a list you use as.list(..)

as.list(iris)

In this case as.list is a S3 generic method, and you are passing a data.frame to it. Thus the S3 function is called as.list.data.frame:

> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>

And for something truly spectacular, load the data.table package and look at the function as.data.table.data.frame:

> library(data.table)

> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*

Non-visible functions are asterisked

> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>

What does the . (dot) mean in ~replace_na(., 0)

This one-sided formula is called a lambda-function.

It is a faster way to write simple anonymous functions, using the internal variable as . or .x. I personally prefer .x, as . is already used by dplyr as the left-hand variable of the pipe, which might cause confusion.

In this context (inside mutate_at), ~replace_na(., 0)) and ~replace_na(.x, 0)) are the same as function(x) replace_na(x, 0).

You can try this with the same result:

df <- df %>% mutate_at(vars(var1, var2, var5, var6), function(x) replace_na(x, 0))

Besides, please note that mutate_at is deprecated as for dplyr 1.0. You might want to use the new syntax with the across function:

df <- df %>% mutate(across(c(var1, var2, var5, var6), ~replace_na(.x, 0)))

What does % % function mean in R?

%...% operators

%>% has no builtin meaning but the user (or a package) is free to define operators of the form %whatever% in any way they like. For example, this function will return a string consisting of its left argument followed by a comma and space and then it's right argument.

"%,%" <- function(x, y) paste0(x, ", ", y)

# test run

"Hello" %,% "World"
## [1] "Hello, World"

The base of R provides %*% (matrix mulitiplication), %/% (integer division), %in% (is lhs a component of the rhs?), %o% (outer product) and %x% (kronecker product). It is not clear whether %% falls in this category or not but it represents modulo.

expm The R package, expm, defines a matrix power operator %^%. For an example see Matrix power in R .

operators The operators R package has defined a large number of such operators such as %!in% (for not %in%). See http://cran.r-project.org/web/packages/operators/operators.pdf

igraph This package defines %--% , %->% and %<-% to select edges.

lubridate This package defines %m+% and %m-% to add and subtract months and %--% to define an interval. igraph also defines %--% .

Pipes

magrittr In the case of %>% the magrittr R package has defined it as discussed in the magrittr vignette. See http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html

magittr has also defined a number of other such operators too. See the Additional Pipe Operators section of the prior link which discusses %T>%, %<>% and %$% and http://cran.r-project.org/web/packages/magrittr/magrittr.pdf for even more details.

dplyr The dplyr R package used to define a %.% operator which is similar; however, it has been deprecated and dplyr now recommends that users use %>% which dplyr imports from magrittr and makes available to the dplyr user. As David Arenburg has mentioned in the comments this SO question discusses the differences between it and magrittr's %>% : Differences between %.% (dplyr) and %>% (magrittr)

pipeR The R package, pipeR, defines a %>>% operator that is similar to magrittr's %>% and can be used as an alternative to it. See http://renkun.me/pipeR-tutorial/

The pipeR package also has defined a number of other such operators too. See: http://cran.r-project.org/web/packages/pipeR/pipeR.pdf

postlogic The postlogic package defined %if% and %unless% operators.

wrapr The R package, wrapr, defines a dot pipe %.>% that is an explicit version of %>% in that it does not do implicit insertion of arguments but only substitutes explicit uses of dot on the right hand side. This can be considered as another alternative to %>%. See https://winvector.github.io/wrapr/articles/dot_pipe.html

Bizarro pipe. This is not really a pipe but rather some clever base syntax to work in a way similar to pipes without actually using pipes. It is discussed in http://www.win-vector.com/blog/2017/01/using-the-bizarro-pipe-to-debug-magrittr-pipelines-in-r/ The idea is that instead of writing:

1:8 %>% sum %>% sqrt
## [1] 6

one writes the following. In this case we explicitly use dot rather than eliding the dot argument and end each component of the pipeline with an assignment to the variable whose name is dot (.) . We follow that with a semicolon.

1:8 ->.; sum(.) ->.; sqrt(.)
## [1] 6

Update Added info on expm package and simplified example at top. Added postlogic package.

Update 2 The development version of R has defined a |> pipe. Unlike magrittr's %>% it can only substitute into the first argument of the right hand side. Although limited, it works via syntax transformation so it has no performance impact.

Meaning of .$VariableName

Here is a simple explanation with a simple example:

iris %>% 
split(.$Species)

The dot(.) basically means take all the data passed into the pipe and split it into groups(for this example) based on Species. When you examine the output, you'll see three "splits" by Species.
Related: Meaning of ~. (tilde dot) argument?

R-changing mean in aggregate function into percentage

I'm going to strongly recommend the tidyverse. Within that set of packages I would approach your problem like so:

library(tidyverse)

percent <-
database %>%
mutate(age_cat = case_when(
under_19 == 1 ~ "below 19",
under_19 == 0 ~ "over 19")) %>%
group_by(year, age_cat) %>%
summarise(count_ = n()) %>%
mutate(percent = count_/sum(count_))

percent %>%
ggplot(aes(x = year, y = percent, color = age_cat)) +
geom_point()

I am assuming that each row is an individual that you would like to count, and you want a summary of the percentage of rows in that year / age_cat grouping. You can also summarise by count_ = sum(insured). Take note that you may want to add the na.rm = TRUE argument to your sum(insured) if you anticipate and need to ignore NA rows.

R combinations with dot ( . ), ~ , and pipe (% %) operator

That line uses the . in three different ways.

         [1]             [2]      [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))

Generally speaking you pass in the value from the pipe into your function at a specific location with . but there are some exceptions. One exception is when the . is in a formula. The ~ is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example

aggregate(. ~ cyl, data=mydata)

And that's just because aggregate requires a formula with both a left and right hand side. So the . at [1] just means "all the other columns in the dataset." This use is not at all related to magrittr.

The . at [2] is the value that's being passed in as the pipe. If you have a plain . as a parameter to the function, that's there the value will be placed. So the result of the subset() will go to the data= parameter.

The magrittr library also allows you to define anonymous functions with the . variable. If you have a chain that starts with a ., it's treated like a function. so

. %>% mean %>% round(2)

is the same as

function(x) round(mean(x), 2)

so you're just creating a custom function with the . at [3]

What is the meaning of dot dot in this ggplot expression?

Normally, aesthetics are a 1:1 mapping to a column in the input data.frame. In this case, density is an output of the binning for the histogram. So, this the ggplot2 way of referring to certain derivatives that are a by product of an aggregation, such as the binning for a histogram in this case.

read.table: percent sign (%) and forward slah (/) in headers replaced by dot (.)

R by default tries to makes sure that the dataframe you are importing have syntactically valid names using check.names which is TRUE by default. It does not allow column names with symbols like %, / (or other as defined in make.names).

We can, however, override this behavior using check.names = FALSE

read.table(text = "Subject,Exp1_BSL_SDNN,Exp1_BSL_LF/HF,Exp1_BSL_%LF
s1,123,123,123
s2,123,123,123", sep=",", header=TRUE, check.names = FALSE)

# Subject Exp1_BSL_SDNN Exp1_BSL_LF/HF Exp1_BSL_%LF
#1 s1 123 123 123
#2 s2 123 123 123

Formatting Decimal places in R

Background: Some answers suggested on this page (e.g., signif, options(digits=...)) do not guarantee that a certain number of decimals are displayed for an arbitrary number. I presume this is a design feature in R whereby good scientific practice involves showing a certain number of digits based on principles of "significant figures". However, in many domains (e.g., APA style, business reports) formatting requirements dictate that a certain number of decimal places are displayed. This is often done for consistency and standardisation purposes rather than being concerned with significant figures.

Solution:

The following code shows exactly two decimal places for the number x.

format(round(x, 2), nsmall = 2)

For example:

format(round(1.20, 2), nsmall = 2)
# [1] "1.20"
format(round(1, 2), nsmall = 2)
# [1] "1.00"
format(round(1.1234, 2), nsmall = 2)
# [1] "1.12"

A more general function is as follows where x is the number and k is the number of decimals to show. trimws removes any leading white space which can be useful if you have a vector of numbers.

specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))

E.g.,

specify_decimal(1234, 5)
# [1] "1234.00000"
specify_decimal(0.1234, 5)
# [1] "0.12340"

Discussion of alternatives:

The formatC answers and sprintf answers work fairly well. But they will show negative zeros in some cases which may be unwanted. I.e.,

formatC(c(-0.001), digits = 2, format = "f")
# [1] "-0.00"
sprintf(-0.001, fmt = '%#.2f')
# [1] "-0.00"

One possible workaround to this is as follows:

formatC(as.numeric(as.character(round(-.001, 2))), digits = 2, format = "f")
# [1] "0.00"


Related Topics



Leave a reply



Submit