The R %In% Operator

Use of $ and %% operators in R

You are not really pulling a value from a function but rather from the list object that the function returns. $ is actually an infix that takes two arguments, the values preceding and following it. It is a convenience function designed that uses non-standard evaluation of its second argument. It's called non-standard because the unquoted characters following $ are first quoted before being used to extract a named element from the first argument.

 t.test  # is the function
t.test(x) # is a named list with one of the names being "p.value"

The value can be pulled in one of three ways:

 t.test(x)$p.value
t.test(x)[['p.value']] # numeric vector
t.test(x)['p.value'] # a list with one item

my.name.for.p.val <- 'p.value'
t.test(x)[[ my.name.for.p.val ]]

When you surround a set of characters with flanking "%"-signs you can create your own vectorized infix function. If you wanted a pmax for which the defautl was na.rm=TRUE do this:

 '%mypmax%' <- function(x,y) pmax(x,y, na.rm=TRUE)

And then use it without quotes:

> c(1:10, NA) %mypmax% c(NA,10:1)
[1] 1 10 9 8 7 6 7 8 9 10 1

Difference between the == and %in% operators in R

%in% is value matching and "returns a vector of the positions of (first) matches of its first argument in its second" (See help('%in%')) This means you could compare vectors of different lengths to see if elements of one vector match at least one element in another. The length of output will be equal to the length of the vector being compared (the first one).

1:2 %in% rep(1:2,5)
#[1] TRUE TRUE

rep(1:2,5) %in% 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

#Note this output is longer in second

== is logical operator meant to compare if two things are exactly equal. If the vectors are of equal length, elements will be compared element-wise. If not, vectors will be recycled. The length of output will be equal to the length of the longer vector.

1:2 == rep(1:2,5)
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

rep(1:2,5) == 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE

1:10 %in% 3:7
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE

#is same as

sapply(1:10, function(a) any(a == 3:7))
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE

NOTE: If possible, try to use identical or all.equal instead of == and.

How to use '%in%' operator in R?

Answer is given, but a bit more detailed simply look at the %in% result

df$col1 %in% myvector
# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE

The above one is correct as you subset df and keep the TRUE values, row 5, 9, 12, 13

versus

myvector %in% df$col1
# [1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE

This one goes wrong as you subset df and tell to keep 1, 2, 6, 7 and as length here is only 10 it recycles 11, 12, 13 as TRUE, TRUE, FALSE again so you get 11 and 12 in your subset as well

How do you read the %in% operator in plain English?

I think your disconnect is understanding how to apply "in" to a vector. You wrote that you want to read it as "Look for 11 and 12 in the month column." You can indeed think of it that way. Your example was:

nov_dec <- filter(flights, month %in% c(11, 12))

And that could be expressed in plain English as:

Give me all the flights where one of the values in c(11, 12) is in the month column

But we could also say that 11 and 12 are "in" the vector c(11, 12). That's what the left-to-right reading would be:

Give me all the flights whose month is in the vector c(11, 12).

Or, expressed slightly differently and more verbosely:

Give me all the flights whose month is equal to one of the values in the vector c(11, 12)

This is conceptually similar to using a bunch of | operators in a row (month == 11 | month == 12), but it's best not to think of those as exactly equivalent. Instead of explicitly comparing x to every value in y, you're asking the question "is x equal to one of the values in y?" That's different in the same way that saying "please turn off the lights" is different than saying "please walk over to that plate on the wall and pull the little stick on it downwards." It's expressing what you want instead of how to figure it out, which makes your code more readable, and code is read more often than it's written, so that's important!!!

Now I'm getting way out of my area - again, I don't know what R actually does here - but the underlying method of answering the question might also be different. It might use a binary search algorithm to find out if x is in y.

The R %in% operator

You can use all

> all(1:6 %in% 0:36)
[1] TRUE
> all(1:60 %in% 0:36)
[1] FALSE

On a similar note, if you want to check whether any of the elements is TRUE you can use any

> any(1:6 %in% 0:36)
[1] TRUE
> any(1:60 %in% 0:36)
[1] TRUE
> any(50:60 %in% 0:36)
[1] FALSE

What does !! operator mean in R

The !! and {{ operators are placeholders to flag a variable as having been quoted. They are usually only needed if you intend to program with the tidyverse.
The tidyverse likes to leverage NSE (non-standard Evaluation) in order to reduce the amount of repetition. The most frequent application is towards the "data.frame" class, in which expressions/symbols are evaluated in the context of a data.frame before searching other scopes.
In order for this to work, some special functions (like in the package dplyr) have arguments that are quoted. To quote an expression, is to save the symbols that make up the expression and prevent the evaluation (in the context of tidyverse they use "quosures", which is like a quoted expression except it contains a reference to the environment the expression was made).
While NSE is great for interactive use, it is notably harder to program with.
Lets consider the dplyr::select

 library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union

iris <- as_tibble(iris)

my_select <- function(.data, col) {
select(.data, col)
}

select(iris, Species)
#> # A tibble: 150 × 1
#> Species
#> <fct>
#> 1 setosa
#> 2 setosa
#> 3 setosa
#> 4 setosa
#> 5 setosa
#> 6 setosa
#> 7 setosa
#> 8 setosa
#> 9 setosa
#> 10 setosa
#> # … with 140 more rows
my_select(iris, Species)
#> Error: object 'Species' not found

we encounter an error because within the scope of my_select
the col argument is evaluated with standard evaluation and
cannot find a variable named Species.

If we attempt to create a variable in the global environemnt, we see that the funciton
works - but it isn't behaving to the heuristics of the tidyverse. In fact,
they produce a note to inform you that this is ambiguous use.

 Species <- "Sepal.Width"
my_select(iris, Species)
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(col)` instead of `col` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> # A tibble: 150 × 1
#> Sepal.Width
#> <dbl>
#> 1 3.5
#> 2 3
#> 3 3.2
#> 4 3.1
#> 5 3.6
#> 6 3.9
#> 7 3.4
#> 8 3.4
#> 9 2.9
#> 10 3.1
#> # … with 140 more rows

To remedy this, we need
to prevent evaluation with enquo() and unquote with !! or just use {{.

 my_select2 <- function(.data, col) {
col_quo <- enquo(col)
select(.data, !!col_quo) #attempting to find whatever symbols were passed to `col` arugment
}
#' `{{` enables the user to skip using the `enquo()` step.
my_select3 <- function(.data, col) {
select(.data, {{col}})
}

my_select2(iris, Species)
#> # A tibble: 150 × 1
#> Species
#> <fct>
#> 1 setosa
#> 2 setosa
#> 3 setosa
#> 4 setosa
#> 5 setosa
#> 6 setosa
#> 7 setosa
#> 8 setosa
#> 9 setosa
#> 10 setosa
#> # … with 140 more rows
my_select3(iris, Species)
#> # A tibble: 150 × 1
#> Species
#> <fct>
#> 1 setosa
#> 2 setosa
#> 3 setosa
#> 4 setosa
#> 5 setosa
#> 6 setosa
#> 7 setosa
#> 8 setosa
#> 9 setosa
#> 10 setosa
#> # … with 140 more rows

In summary, you really only need !! and {{ if you are trying to apply NSE programatically
or do some type of programming on the language.

!!! is used to splice a list/vector of some sort into arguments of some quoting expression.

 library(rlang)
quo_let <- quo(paste(!!!LETTERS))
quo_let
#> <quosure>
#> expr: ^paste("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L",
#> "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y",
#> "Z")
#> env: global
eval_tidy(quo_let)
#> [1] "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"

Created on 2021-08-30 by the reprex package (v2.0.1)

Operator in R argument

<<- and <- are both assignment operators, but they are subtly different.

<- only applies to the local environment where it is used, so if you use it to assign a variable inside a function, that variable will not be available outside that function.

If you use <<- inside a function to declare a new variable with a name you haven't used anywhere else, it will create that variable in the global environment. If you use it to assign to an existing variable within your function (or any function which contains your function), it will be assigned to the existing variable instead.

It is almost always a bad idea to assign to the global environment from within a function. If you absolutely have to write variables from inside a function, it is better to use assign to write the variable to another persistent environment.

local_assign <- function() {a <- 1;}
global_assign <- function() {b <<- 1;}

local_assign()
global_assign()
a
# Error: object 'a' not found
b
# [1] 1

What do the %op% operators in mean? For example %in% ?

Put quotes around it to find the help page. Either of these work

> help("%in%")
> ?"%in%"

Once you get to the help page, you'll see that

‘%in%’ is currently defined as

‘"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0’


Since time is a generic, I don't know what time(X2) returns without knowing what X2 is. But, %in% tells you which items from the left hand side are also in the right hand side.

> c(1:5) %in% c(3:8)
[1] FALSE FALSE TRUE TRUE TRUE

See also, intersect

> intersect(c(1:5), c(3:8))
[1] 3 4 5

What is the % -% operator in R?

It is a multiple assignment operator from zeallot

%<-% and %->% invisibly returnvalue.

These operators are used primarily for their assignment side-effect.
%<-% and %->% assign into the environment in which they are evaluated.

i.e. it creates multiple objects from a single line of code

> library(zeallot)
> c(x, y, z) %<-% c(1, 3, 5)
> x
[1] 1
> y
[1] 3
> z
[1] 5


Related Topics



Leave a reply



Submit