R: What Are Operators Like %In% Called and How to Learn About Them

R: What are operators like %in% called and how can I learn about them?

There are several different things going on here with the percent symbol:

Binary Operators

As several have already pointed out, things of the form %%, %in%, %*% are binary operators (respectively modulo, match, and matrix multiply), just like a +, -, etc. They are functions that operate on two arguments that R recognizes as being special due to their name structure (starts and ends with a %). This allows you to use them in form:

Argument1 %fun_name% Argument2

instead of the more traditional:

fun_name(Argument1, Argument2)

Keep in mind that the following are equivalent:

10 %% 2 == `%%`(10, 2)
"hello" %in% c("hello", "world") == `%in%`("hello", c("hello", "world"))
10 + 2 == `+`(10, 2)

R just recognizes the standard operators as well as the %x% operators as special and allows you to use them as traditional binary operators if you don't quote them. If you quote them (in the examples above with backticks), you can use them as standard two argument functions.

Custom Binary Operators

The big difference between the standard binary operators and %x% operators is that you can define custom binary operators and R will recognize them as special and treat them as binary operators:

`%samp%` <- function(e1, e2) sample(e1, e2)
1:10 %samp% 2
# [1] 1 9

Here we defined a binary operator version of the sample function

"%" (Percent) as a token in special function

The meaning of "%" in function like sprintf or format is completely different and has nothing to do with binary operators. The key thing to note is that in those functions the % character is part of a quoted string, and not a standard symbol on the command line (i.e. "%" and % are very different). In the context of sprintf, inside a string, "%" is a special character used to recognize that the subsequent characters have a special meaning and should not be interpreted as regular text. For example, in:

sprintf("I'm a number: %.2f", runif(3))
# [1] "I'm a number: 0.96" "I'm a number: 0.74" "I'm a number: 0.99"

"%.2f" means a floating point number (f) to be displayed with two decimals (.2). Notice how the "I'm a number: " piece is interpreted literally. The use of "%" allows sprintf users to mix literal text with special instructions on how to represent the other sprintf arguments.

R: What do you call the :: and ::: operators and how do they differ?

It turns out there is a unique way to access help info for operators such as these colons: add quotations marks around the operator. [E.g., ?'::' or help(":::")].

Also, instead of quotation marks, back-ticks (i.e, ` ) also work.

Double Colon Operator and Triple Colon Operator

The answer to the question can be found on the help page for "Double Colon and Triple Colon Operators" (see here).

For a package pkg, pkg::name returns the value of the exported variable name in namespace pkg, whereas pkg:::name returns the value of the internal variable name. The package namespace will be loaded if it was not loaded before the call, but the package will not be attached to the search path.

The difference can be seen by examining the code of each:

> `::`
function (pkg, name) 
{
    pkg <- as.character(substitute(pkg))
    name <- as.character(substitute(name))
    getExportedValue(pkg, name)
}
<bytecode: 0x00000000136e2ae8>
<environment: namespace:base>

> `:::`
function (pkg, name) 
{
    pkg <- as.character(substitute(pkg))
    name <- as.character(substitute(name))
    get(name, envir = asNamespace(pkg), inherits = FALSE)
}
<bytecode: 0x0000000013482f50>
<environment: namespace:base>

:: calls getExportedValue(pkg, name), returning the value of the exported variable name in the package's namespace.

::: calls get(name, envir = asNamespace(pkg), inherits = FALSE), searching for the object name in the Namespace environment of the package, and returning the value of the internal variable name.

So, what exactly is a namespace?

This site does a good job of explaining the concept of namespaces in R. Importantly:

As the name suggests, namespaces provide “spaces” for “names”. They provide a context for looking up the value of an object associated with a name.

What is the % -% operator in R?

It is a multiple assignment operator from zeallot

%<-% and %->% invisibly returnvalue.

These operators are used primarily for their assignment side-effect.
%<-% and %->% assign into the environment in which they are evaluated.

i.e. it creates multiple objects from a single line of code

> library(zeallot)
> c(x, y, z) %<-% c(1, 3, 5)
> x
[1] 1
> y
[1] 3
> z
[1] 5

do.call-like function for binary operators in R

For this case in particular, this would do the same trick:

> apply(foo, 1, function(x) Reduce("|", x))
[1]  TRUE FALSE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE

I'm not sure if this will generalize to whatever real problem you have in mind, but it feels like something related to Reduce is what you have in mind, no?

Are binary operators / infix functions in R generic? And how to make use of?

Yes, this is possible: use '+.<class name>' <- function().

Examples

'+.product' <- function(a, b) a * b
'+.expo' <- function(a, b) a ^ b

m <- 2; class(m) <- "product"
n <- 3; class(n) <- "product"

r <- 2; class(r) <- "expo"
s <- 3; class(s) <- "expo"

m + n # gives 6
r + s # gives 8

safety notes

The new defined functions will be called if at least one of the arguments is from the corresponding class m + 4 gives you 2 * 4 = 8 and not 2 + 4 = 6. If the classes don't match, you will get an error message (like for r + m). So all in all, be sure that you want to establish a new function behind such basic functions like +.

What are the differences between = and - assignment operators?

What are the differences between the assignment operators = and <- in R?

As your example shows, = and <- have slightly different operator precedence (which determines the order of evaluation when they are mixed in the same expression). In fact, ?Syntax in R gives the following operator precedence table, from highest to lowest:

…
‘-> ->>’           rightwards assignment
‘<- <<-’           assignment (right to left)
‘=’                assignment (right to left)
…

But is this the only difference?

Since you were asking about the assignment operators: yes, that is the only difference. However, you would be forgiven for believing otherwise. Even the R documentation of ?assignOps claims that there are more differences:

The operator <- can be used anywhere,
whereas the operator = is only allowed at the top level (e.g.,
in the complete expression typed at the command prompt) or as one
of the subexpressions in a braced list of expressions.

Let’s not put too fine a point on it: the R documentation is wrong. This is easy to show: we just need to find a counter-example of the = operator that isn’t (a) at the top level, nor (b) a subexpression in a braced list of expressions (i.e. {…; …}). — Without further ado:

x
# Error: object 'x' not found
sum((x = 1), 2)
# [1] 3
x
# [1] 1

Clearly we’ve performed an assignment, using =, outside of contexts (a) and (b). So, why has the documentation of a core R language feature been wrong for decades?

It’s because in R’s syntax the symbol = has two distinct meanings that get routinely conflated (even by experts, including in the documentation cited above):

The first meaning is as an assignment operator. This is all we’ve talked about so far.
The second meaning isn’t an operator but rather a syntax token that signals named argument passing in a function call. Unlike the = operator it performs no action at runtime, it merely changes the way an expression is parsed.

So how does R decide whether a given usage of = refers to the operator or to named argument passing? Let’s see.

In any piece of code of the general form …

‹function_name›(‹argname› = ‹value›, …)
‹function_name›(‹args›, ‹argname› = ‹value›, …)

… the = is the token that defines named argument passing: it is not the assignment operator. Furthermore, = is entirely forbidden in some syntactic contexts:

if (‹var› = ‹value›) …
while (‹var› = ‹value›) …
for (‹var› = ‹value› in ‹value2›) …
for (‹var1› in ‹var2› = ‹value›) …

Any of these will raise an error “unexpected '=' in ‹bla›”.

In any other context, = refers to the assignment operator call. In particular, merely putting parentheses around the subexpression makes any of the above (a) valid, and (b) an assignment. For instance, the following performs assignment:

median((x = 1 : 10))

But also:

if (! (nf = length(from))) return()

_{Now you might object that such code is atrocious (and you may be right). But I took this code from the base::file.copy function (replacing <- with =) — it’s a pervasive pattern in much of the core R codebase.}

The original explanation by John Chambers, which the the R documentation is probably based on, actually explains this correctly:

[= assignment is] allowed in only two places in the grammar: at the top level (as a complete program or user-typed expression); and when isolated from surrounding logical structure, by braces or an extra pair of parentheses.

In sum, by default the operators <- and = do the same thing. But either of them can be overridden separately to change its behaviour. By contrast, <- and -> (left-to-right assignment), though syntactically distinct, always call the same function. Overriding one also overrides the other. Knowing this is rarely practical but it can be used for some fun shenanigans.

Define new operator for tilde / formula

I guess, if you accept the comment, I can make an answer out of it:

~ is an operator in R like +,-, /,*. Although it is possible to use many kinds of characters for your variables using ticks `xxx` and qoute "xxx" you also need to access them with ticks (see ?Reserved). (I'm gonna use quotes instead of ticks here, but consider using ticks for a more accepted style guide.)

R is a functional programming language and therefore you can access every single language statement as a function, e.g. a + b is the same as "+"(a, b). When you write a + b it is just syntactic sugar - language-wise it is translated into a primitive function call with two arguments.

To complicate things, there is an order of evaluation. So if you write a~~b it gets translated into "~"(a, ~b). It is because ~ is a primitive operator desiged as a sigle character. You still can define the function "~~" <- function(a,b) {a + b}, but you can only call it by "~~"(a,b) directly for it to work.

On the other hand, you need to be able to specify how a binary operator looks like. Having defined a function "asdf" <- function(a,b) {a + b} is not enough and this will not work: a asdf b

R has something to define binary operators (R: What are operators like %in% called and how can I learn about them?), see large portion of binary operators used like in magrittr's %>% or doParallel's %dopar%. Thus it is better to stick to the binary operator syntax using %, i.e. <tick>%~~%<tick> <- function(a,b) {a+b}. Then you can easily access it by using syntactic sugar a %~~% b.

Strange stuff, I agree. As for magic tricks: try this at home "for"(a, 1:10, {print(a)}). Bonus question: why is a visible in the parent frame ?

Assigning operators into an R variable

Check out do.call, which takes the name of a function as an argument. With

operator <- "+"
do.call(operator, list(2,3)

you will get 5 as the result.

In your example:

test <- function(items, operator = "+"){
  bank_alpha <- matrix(ncol=6)
  colnames(bank_alpha) <- colnames(bank_alpha, do.NULL = FALSE, prefix = "Q")
  colnames(bank_alpha)[6] <- "A"
  alphabet <- LETTERS[seq(1:26)]

  for (i in 1:items) {
    item <- c(alphabet[i], alphabet[do.call(operator, list(i,1))], alphabet[do.call(operator, list(i,2))], alphabet[do.call(operator, list(i,3))], alphabet[do.call(operator, list(i,4))], alphabet[do.call(operator, list(i,5))])
    bank_alpha <- rbind(bank_alpha, item)
    bank_alpha <- na.omit(bank_alpha)
  }
  return(bank_alpha)
}

test(items=4, operator = "*")

Beware, "-" doesn't make sense in this case.

How to use the %.% operator in R (EDIT: operator deprecated in 2014)

I think Hadley would be the best person to explain to you, but I will give it a shot.

%.% is a binary operator called chain operator. In Ryou can pretty much define any binary operator of your own with the special character %. From what I have seem, we pretty much use it to make easier "chainable" syntaxes (like x+y, much better than sum(x,y)). You can do really cool stuff with them, see this cool example here.

What is the purpose of %.% in dplyr? To make it easier for you to express yourself, reducing the gap between what you want to do and how you express it.

Taking the example from the introduction to dplyr, let's suppose you want to group flights by year, month and day, select those variables plus the delays in arrival and departure, summarise these by the mean and then filter just those delays over 30. If there were no %.%, you would have to write like this:

filter(
  summarise(
    select(
      group_by(hflights, Year, Month, DayofMonth),
      Year:DayofMonth, ArrDelay, DepDelay
    ),
    arr = mean(ArrDelay, na.rm = TRUE),
    dep = mean(DepDelay, na.rm = TRUE)
  ),
  arr > 30 | dep > 30
)

It does the job. But it is pretty difficult to express yourself and to read it. Now, you can write the same thing with a more friendly syntax using the chain operator %.%:

hflights %.%
  group_by(Year, Month, DayofMonth) %.%
  select(Year:DayofMonth, ArrDelay, DepDelay) %.%
  summarise(
    arr = mean(ArrDelay, na.rm = TRUE),
    dep = mean(DepDelay, na.rm = TRUE)
  ) %.%
  filter(arr > 30 | dep > 30)

It is easier both to write and read!

And how does that work?

Let's take a look at the definitions. First for %.%:

function (x, y) 
{
    chain_q(list(substitute(x), substitute(y)), env = parent.frame())
}

It uses another function called chain_q. So let's look at it:

function (calls, env = parent.frame()) 
{
    if (length(calls) == 0) 
        return()
    if (length(calls) == 1) 
        return(eval(calls[[1]], env))
    e <- new.env(parent = env)
    e$`__prev` <- eval(calls[[1]], env)
    for (call in calls[-1]) {
        new_call <- as.call(c(call[[1]], quote(`__prev`), as.list(call[-1])))
        e$`__prev` <- eval(new_call, e)
    }
    e$`__prev`
}

What does that do?

To simplify things, let's assume you called: group_by(hflights,Year, Month, DayofMonth) %.% select(Year:DayofMonth, ArrDelay, DepDelay).

Your calls x and y are then both group_by(hflights,Year, Month, DayofMonth) and select(Year:DayofMonth, ArrDelay, DepDelay). So the function creates a new environment called e (e <- new.env(parent = env)) and saves an object called __prev with the evaluation of the first call (e$'__prev' <- eval(calls[[1]], env). Then for each other call it creates another call whose first argument is the previous call - that is __prev - in our case it would be select('__prev', Year:DayofMonth, ArrDelay, DepDelay) - so it "chains" the calls inside the loop.

Since you can use binary operators one over another, you actually can use this syntax to express very complex manipulations in a very readable way.