What do . (dot) and % (percentage) mean in R?
.
has no inherent/magical meaning in R. It's just another character that you can use in symbol names. But because it is so convenient to type, it has been given special meaning by certain functions and conventions in R. Here are just a few
.
is used look up S3 generic method implementations. For example, if you call a generic function likeplot
with an object of classlm
as the first parameter, then it will look for a function namedplot.lm
and, if found, call that.- often
.
in formulas means "all other variables", for examplelm(y~., data=dd)
will regressy
on all the other variables in the data.framedd
. - libraries like
dplyr
use it as a special variable name to indicate the current data.frame for methods likedo()
. They could just as easily have chosen to use the variable nameX
instead - functions like
bquote
use.()
as a special function to escape variables in expressions - variables that start with a period are considered "hidden" and will not show up with
ls()
unless you callls(all.names=TRUE)
(similar to the UNIX file system behavior)
However, you can also just define a variable named my.awesome.variable<-42
and it will work just like any other variable.
A %
by itself doesn't mean anything special, but R allows you to define your own infix operators in the form %<something>%
using two percent signs. If you define
`%myfun%` <- function(a,b) {
a*3-b*2
}
you can call it like
5 %myfun% 2
# [1] 11
What does the dot mean in R – personal preference, naming convention or more?
A dot in function name can mean any of the following:
- nothing at all
- a separator between method and class in S3 methods
- to hide the function name
Possible meanings
1. Nothing at all
The dot in data.frame
doesn't separate data
from frame
, other than visually.
2. Separation of methods and classes in S3 methods
plot
is one example of a generic S3 method. Thus plot.lm
and plot.glm
are the underlying function definitions that are used when calling plot(lm(...))
or plot(glm(...))
3. To hide internal functions
When writing packages, it is sometimes useful to use leading dots in function names because these functions are somewhat hidden from general view. Functions that are meant to be purely internal to a package sometimes use this.
In this context, "somewhat hidden" simply means that the variable (or function) won't normally show up when you list object with ls()
. To force ls
to show these variables, use ls(all.names=TRUE)
. By using a dot as first letter of a variable, you change the scope of the variable itself. For example:
x <- 3
.x <- 4
ls()
[1] "x"
ls(all.names=TRUE)
[1] ".x" "x"
x
[1] 3
.x
[1] 4
4. Other possible reasons
In Hadley's plyr package, he uses the convention to use leading dots in function names. This as a mechanism to try and ensure that when resolving variable names, the values resolve to the user variables rather than internal function variables.
Complications
This mishmash of different uses can lead to very confusing situations, because these different uses can all get mixed up in the same function name.
For example, to convert a data.frame
to a list you use as.list(..)
as.list(iris)
In this case as.list
is a S3 generic method, and you are passing a data.frame
to it. Thus the S3 function is called as.list.data.frame
:
> as.list.data.frame
function (x, ...)
{
x <- unclass(x)
attr(x, "row.names") <- NULL
x
}
<environment: namespace:base>
And for something truly spectacular, load the data.table
package and look at the function as.data.table.data.frame
:
> library(data.table)
> methods(as.data.table)
[1] as.data.table.data.frame* as.data.table.data.table* as.data.table.matrix*
Non-visible functions are asterisked
> data.table:::as.data.table.data.frame
function (x, keep.rownames = FALSE)
{
if (keep.rownames)
return(data.table(rn = rownames(x), x, keep.rownames = FALSE))
attr(x, "row.names") = .set_row_names(nrow(x))
class(x) = c("data.table", "data.frame")
x
}
<environment: namespace:data.table>
What does the . (dot) mean in ~replace_na(., 0)
This one-sided formula is called a lambda-function.
It is a faster way to write simple anonymous functions, using the internal variable as .
or .x
. I personally prefer .x
, as .
is already used by dplyr
as the left-hand variable of the pipe, which might cause confusion.
In this context (inside mutate_at
), ~replace_na(., 0))
and ~replace_na(.x, 0))
are the same as function(x) replace_na(x, 0)
.
You can try this with the same result:
df <- df %>% mutate_at(vars(var1, var2, var5, var6), function(x) replace_na(x, 0))
Besides, please note that mutate_at
is deprecated as for dplyr 1.0
. You might want to use the new syntax with the across
function:
df <- df %>% mutate(across(c(var1, var2, var5, var6), ~replace_na(.x, 0)))
What does % % function mean in R?
%...% operators
%>%
has no builtin meaning but the user (or a package) is free to define operators of the form %whatever%
in any way they like. For example, this function will return a string consisting of its left argument followed by a comma and space and then it's right argument.
"%,%" <- function(x, y) paste0(x, ", ", y)
# test run
"Hello" %,% "World"
## [1] "Hello, World"
The base of R provides %*%
(matrix mulitiplication), %/%
(integer division), %in%
(is lhs a component of the rhs?), %o%
(outer product) and %x%
(kronecker product). It is not clear whether %%
falls in this category or not but it represents modulo.
expm The R package, expm, defines a matrix power operator %^%
. For an example see Matrix power in R .
operators The operators R package has defined a large number of such operators such as %!in%
(for not %in%
). See http://cran.r-project.org/web/packages/operators/operators.pdf
igraph This package defines %--% , %->% and %<-% to select edges.
lubridate This package defines %m+% and %m-% to add and subtract months and %--% to define an interval. igraph also defines %--% .
Pipes
magrittr In the case of %>%
the magrittr R package has defined it as discussed in the magrittr vignette. See http://cran.r-project.org/web/packages/magrittr/vignettes/magrittr.html
magittr has also defined a number of other such operators too. See the Additional Pipe Operators section of the prior link which discusses %T>%
, %<>%
and %$%
and http://cran.r-project.org/web/packages/magrittr/magrittr.pdf for even more details.
dplyr The dplyr R package used to define a %.%
operator which is similar; however, it has been deprecated and dplyr now recommends that users use %>%
which dplyr imports from magrittr and makes available to the dplyr user. As David Arenburg has mentioned in the comments this SO question discusses the differences between it and magrittr's %>%
: Differences between %.% (dplyr) and %>% (magrittr)
pipeR The R package, pipeR, defines a %>>%
operator that is similar to magrittr's %>% and can be used as an alternative to it. See http://renkun.me/pipeR-tutorial/
The pipeR package also has defined a number of other such operators too. See: http://cran.r-project.org/web/packages/pipeR/pipeR.pdf
postlogic The postlogic package defined %if%
and %unless%
operators.
wrapr The R package, wrapr, defines a dot pipe %.>%
that is an explicit version of %>%
in that it does not do implicit insertion of arguments but only substitutes explicit uses of dot on the right hand side. This can be considered as another alternative to %>%
. See https://winvector.github.io/wrapr/articles/dot_pipe.html
Bizarro pipe. This is not really a pipe but rather some clever base syntax to work in a way similar to pipes without actually using pipes. It is discussed in http://www.win-vector.com/blog/2017/01/using-the-bizarro-pipe-to-debug-magrittr-pipelines-in-r/ The idea is that instead of writing:
1:8 %>% sum %>% sqrt
## [1] 6
one writes the following. In this case we explicitly use dot rather than eliding the dot argument and end each component of the pipeline with an assignment to the variable whose name is dot (.
) . We follow that with a semicolon.
1:8 ->.; sum(.) ->.; sqrt(.)
## [1] 6
Update Added info on expm package and simplified example at top. Added postlogic package.
Update 2 The development version of R has defined a |>
pipe. Unlike magrittr's %>%
it can only substitute into the first argument of the right hand side. Although limited, it works via syntax transformation so it has no performance impact.
Meaning of .$VariableName
Here is a simple explanation with a simple example:
iris %>%
split(.$Species)
The dot(.
) basically means take all the data passed into the pipe
and split it into groups(for this example) based on Species. When you examine the output, you'll see three "splits" by Species.
Related: Meaning of ~. (tilde dot) argument?
R-changing mean in aggregate function into percentage
I'm going to strongly recommend the tidyverse. Within that set of packages I would approach your problem like so:
library(tidyverse)
percent <-
database %>%
mutate(age_cat = case_when(
under_19 == 1 ~ "below 19",
under_19 == 0 ~ "over 19")) %>%
group_by(year, age_cat) %>%
summarise(count_ = n()) %>%
mutate(percent = count_/sum(count_))
percent %>%
ggplot(aes(x = year, y = percent, color = age_cat)) +
geom_point()
I am assuming that each row is an individual that you would like to count, and you want a summary of the percentage of rows in that year / age_cat grouping. You can also summarise by count_ = sum(insured)
. Take note that you may want to add the na.rm = TRUE
argument to your sum(insured)
if you anticipate and need to ignore NA
rows.
R combinations with dot ( . ), ~ , and pipe (% %) operator
That line uses the .
in three different ways.
[1] [2] [3]
aggregate(. ~ cyl, data = ., FUN = . %>% mean %>% round(2))
Generally speaking you pass in the value from the pipe into your function at a specific location with .
but there are some exceptions. One exception is when the .
is in a formula. The ~
is used to create formulas in R. The pipe wont change the meaning of the formula, so it behaves like it would without any escaping. For example
aggregate(. ~ cyl, data=mydata)
And that's just because aggregate
requires a formula with both a left and right hand side. So the .
at [1]
just means "all the other columns in the dataset." This use is not at all related to magrittr.
The .
at [2]
is the value that's being passed in as the pipe. If you have a plain .
as a parameter to the function, that's there the value will be placed. So the result of the subset()
will go to the data=
parameter.
The magrittr
library also allows you to define anonymous functions with the .
variable. If you have a chain that starts with a .
, it's treated like a function. so
. %>% mean %>% round(2)
is the same as
function(x) round(mean(x), 2)
so you're just creating a custom function with the .
at [3]
What is the meaning of dot dot in this ggplot expression?
Normally, aesthetics are a 1:1 mapping to a column in the input data.frame. In this case, density is an output of the binning for the histogram. So, this the ggplot2 way of referring to certain derivatives that are a by product of an aggregation, such as the binning for a histogram in this case.
read.table: percent sign (%) and forward slah (/) in headers replaced by dot (.)
R by default tries to makes sure that the dataframe you are importing have syntactically valid names using check.names
which is TRUE
by default. It does not allow column names with symbols like %
, /
(or other as defined in make.names
).
We can, however, override this behavior using check.names = FALSE
read.table(text = "Subject,Exp1_BSL_SDNN,Exp1_BSL_LF/HF,Exp1_BSL_%LF
s1,123,123,123
s2,123,123,123", sep=",", header=TRUE, check.names = FALSE)
# Subject Exp1_BSL_SDNN Exp1_BSL_LF/HF Exp1_BSL_%LF
#1 s1 123 123 123
#2 s2 123 123 123
Formatting Decimal places in R
Background: Some answers suggested on this page (e.g., signif
, options(digits=...)
) do not guarantee that a certain number of decimals are displayed for an arbitrary number. I presume this is a design feature in R whereby good scientific practice involves showing a certain number of digits based on principles of "significant figures". However, in many domains (e.g., APA style, business reports) formatting requirements dictate that a certain number of decimal places are displayed. This is often done for consistency and standardisation purposes rather than being concerned with significant figures.
Solution:
The following code shows exactly two decimal places for the number x
.
format(round(x, 2), nsmall = 2)
For example:
format(round(1.20, 2), nsmall = 2)
# [1] "1.20"
format(round(1, 2), nsmall = 2)
# [1] "1.00"
format(round(1.1234, 2), nsmall = 2)
# [1] "1.12"
A more general function is as follows where x
is the number and k
is the number of decimals to show. trimws
removes any leading white space which can be useful if you have a vector of numbers.
specify_decimal <- function(x, k) trimws(format(round(x, k), nsmall=k))
E.g.,
specify_decimal(1234, 5)
# [1] "1234.00000"
specify_decimal(0.1234, 5)
# [1] "0.12340"
Discussion of alternatives:
The formatC answers and sprintf answers work fairly well. But they will show negative zeros in some cases which may be unwanted. I.e.,
formatC(c(-0.001), digits = 2, format = "f")
# [1] "-0.00"
sprintf(-0.001, fmt = '%#.2f')
# [1] "-0.00"
One possible workaround to this is as follows:
formatC(as.numeric(as.character(round(-.001, 2))), digits = 2, format = "f")
# [1] "0.00"
Related Topics
How to Close Unused Connections After Read_HTML in R
How to Rotate Legend Symbols in Ggplot2
Knitr: Getting a Parse_All Error in R When Converting Rmd File into HTML
How to Remove Columns with Same Value in R
Creating R Package, Warning: Package '---' Was Built Under R Version 3.1.2
The Right Way to Plot Multiple Y Values as Separate Lines with Ggplot2
Scale/Normalize Columns by Group
Auto Complete and Selection of Multiple Values in Text Box Shiny
Devtools::Install_Github Fails with Ca Cert Error
Ggplot2: Have Shorter Tick Marks for Tick Marks Without Labels
Calculate Mean for Multiple Columns in Data.Frame
How to Iterate Over List of Dates Without Coercion to Numeric
How to Remove an Element in ... (Dot-Dot-Dot) and Pass It On
Using 'Rvest' to Extract Links