Use of $ and %% operators in R
You are not really pulling a value from a function but rather from the list object that the function returns. $
is actually an infix that takes two arguments, the values preceding and following it. It is a convenience function designed that uses non-standard evaluation of its second argument. It's called non-standard because the unquoted characters following $
are first quoted before being used to extract a named element from the first argument.
t.test # is the function
t.test(x) # is a named list with one of the names being "p.value"
The value can be pulled in one of three ways:
t.test(x)$p.value
t.test(x)[['p.value']] # numeric vector
t.test(x)['p.value'] # a list with one item
my.name.for.p.val <- 'p.value'
t.test(x)[[ my.name.for.p.val ]]
When you surround a set of characters with flanking "%"-signs you can create your own vectorized infix function. If you wanted a pmax
for which the defautl was na.rm=TRUE do this:
'%mypmax%' <- function(x,y) pmax(x,y, na.rm=TRUE)
And then use it without quotes:
> c(1:10, NA) %mypmax% c(NA,10:1)
[1] 1 10 9 8 7 6 7 8 9 10 1
Difference between the == and %in% operators in R
%in%
is value matching and "returns a vector of the positions of (first) matches of its first argument in its second" (See help('%in%')
) This means you could compare vectors of different lengths to see if elements of one vector match at least one element in another. The length of output will be equal to the length of the vector being compared (the first one).
1:2 %in% rep(1:2,5)
#[1] TRUE TRUE
rep(1:2,5) %in% 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
#Note this output is longer in second
==
is logical operator meant to compare if two things are exactly equal. If the vectors are of equal length, elements will be compared element-wise. If not, vectors will be recycled. The length of output will be equal to the length of the longer vector.
1:2 == rep(1:2,5)
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
rep(1:2,5) == 1:2
#[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
1:10 %in% 3:7
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
#is same as
sapply(1:10, function(a) any(a == 3:7))
#[1] FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE
NOTE: If possible, try to use identical
or all.equal
instead of ==
and.
How to use '%in%' operator in R?
Answer is given, but a bit more detailed simply look at the %in% result
df$col1 %in% myvector
# [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE TRUE
The above one is correct as you subset df and keep the TRUE values, row 5, 9, 12, 13
versus
myvector %in% df$col1
# [1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE FALSE FALSE FALSE
This one goes wrong as you subset df and tell to keep 1, 2, 6, 7 and as length here is only 10 it recycles 11, 12, 13 as TRUE, TRUE, FALSE again so you get 11 and 12 in your subset as well
How do you read the %in% operator in plain English?
I think your disconnect is understanding how to apply "in" to a vector. You wrote that you want to read it as "Look for 11 and 12 in the month column." You can indeed think of it that way. Your example was:
nov_dec <- filter(flights, month %in% c(11, 12))
And that could be expressed in plain English as:
Give me all the flights where one of the values in
c(11, 12)
is in the month column
But we could also say that 11 and 12 are "in" the vector c(11, 12)
. That's what the left-to-right reading would be:
Give me all the flights whose month is in the vector
c(11, 12)
.
Or, expressed slightly differently and more verbosely:
Give me all the flights whose month is equal to one of the values in the vector
c(11, 12)
This is conceptually similar to using a bunch of |
operators in a row (month == 11 | month == 12
), but it's best not to think of those as exactly equivalent. Instead of explicitly comparing x
to every value in y
, you're asking the question "is x
equal to one of the values in y
?" That's different in the same way that saying "please turn off the lights" is different than saying "please walk over to that plate on the wall and pull the little stick on it downwards." It's expressing what you want instead of how to figure it out, which makes your code more readable, and code is read more often than it's written, so that's important!!!
Now I'm getting way out of my area - again, I don't know what R actually does here - but the underlying method of answering the question might also be different. It might use a binary search algorithm to find out if x
is in y
.
The R %in% operator
You can use all
> all(1:6 %in% 0:36)
[1] TRUE
> all(1:60 %in% 0:36)
[1] FALSE
On a similar note, if you want to check whether any of the elements is TRUE you can use any
> any(1:6 %in% 0:36)
[1] TRUE
> any(1:60 %in% 0:36)
[1] TRUE
> any(50:60 %in% 0:36)
[1] FALSE
What does !! operator mean in R
The !!
and {{
operators are placeholders to flag a variable as having been quoted. They are usually only needed if you intend to program with the tidyverse
.
The tidyverse
likes to leverage NSE (non-standard Evaluation) in order to reduce the amount of repetition. The most frequent application is towards the "data.frame"
class, in which expressions/symbols are evaluated in the context of a data.frame before searching other scopes.
In order for this to work, some special functions (like in the package dplyr
) have arguments that are quoted. To quote an expression, is to save the symbols that make up the expression and prevent the evaluation (in the context of tidyverse
they use "quosures", which is like a quoted expression except it contains a reference to the environment the expression was made).
While NSE is great for interactive use, it is notably harder to program with.
Lets consider the dplyr::select
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
iris <- as_tibble(iris)
my_select <- function(.data, col) {
select(.data, col)
}
select(iris, Species)
#> # A tibble: 150 × 1
#> Species
#> <fct>
#> 1 setosa
#> 2 setosa
#> 3 setosa
#> 4 setosa
#> 5 setosa
#> 6 setosa
#> 7 setosa
#> 8 setosa
#> 9 setosa
#> 10 setosa
#> # … with 140 more rows
my_select(iris, Species)
#> Error: object 'Species' not found
we encounter an error because within the scope of my_select
the col
argument is evaluated with standard evaluation and
cannot find a variable named Species
.
If we attempt to create a variable in the global environemnt, we see that the funciton
works - but it isn't behaving to the heuristics of the tidyverse
. In fact,
they produce a note to inform you that this is ambiguous use.
Species <- "Sepal.Width"
my_select(iris, Species)
#> Note: Using an external vector in selections is ambiguous.
#> ℹ Use `all_of(col)` instead of `col` to silence this message.
#> ℹ See <https://tidyselect.r-lib.org/reference/faq-external-vector.html>.
#> This message is displayed once per session.
#> # A tibble: 150 × 1
#> Sepal.Width
#> <dbl>
#> 1 3.5
#> 2 3
#> 3 3.2
#> 4 3.1
#> 5 3.6
#> 6 3.9
#> 7 3.4
#> 8 3.4
#> 9 2.9
#> 10 3.1
#> # … with 140 more rows
To remedy this, we need
to prevent evaluation with enquo()
and unquote with !!
or just use {{
.
my_select2 <- function(.data, col) {
col_quo <- enquo(col)
select(.data, !!col_quo) #attempting to find whatever symbols were passed to `col` arugment
}
#' `{{` enables the user to skip using the `enquo()` step.
my_select3 <- function(.data, col) {
select(.data, {{col}})
}
my_select2(iris, Species)
#> # A tibble: 150 × 1
#> Species
#> <fct>
#> 1 setosa
#> 2 setosa
#> 3 setosa
#> 4 setosa
#> 5 setosa
#> 6 setosa
#> 7 setosa
#> 8 setosa
#> 9 setosa
#> 10 setosa
#> # … with 140 more rows
my_select3(iris, Species)
#> # A tibble: 150 × 1
#> Species
#> <fct>
#> 1 setosa
#> 2 setosa
#> 3 setosa
#> 4 setosa
#> 5 setosa
#> 6 setosa
#> 7 setosa
#> 8 setosa
#> 9 setosa
#> 10 setosa
#> # … with 140 more rows
In summary, you really only need !!
and {{
if you are trying to apply NSE programatically
or do some type of programming on the language.
!!!
is used to splice a list/vector of some sort into arguments of some quoting expression.
library(rlang)
quo_let <- quo(paste(!!!LETTERS))
quo_let
#> <quosure>
#> expr: ^paste("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L",
#> "M", "N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y",
#> "Z")
#> env: global
eval_tidy(quo_let)
#> [1] "A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
Created on 2021-08-30 by the reprex package (v2.0.1)
Operator in R argument
<<-
and <-
are both assignment operators, but they are subtly different.
<-
only applies to the local environment where it is used, so if you use it to assign a variable inside a function, that variable will not be available outside that function.
If you use <<-
inside a function to declare a new variable with a name you haven't used anywhere else, it will create that variable in the global environment. If you use it to assign to an existing variable within your function (or any function which contains your function), it will be assigned to the existing variable instead.
It is almost always a bad idea to assign to the global environment from within a function. If you absolutely have to write variables from inside a function, it is better to use assign
to write the variable to another persistent environment.
local_assign <- function() {a <- 1;}
global_assign <- function() {b <<- 1;}
local_assign()
global_assign()
a
# Error: object 'a' not found
b
# [1] 1
What do the %op% operators in mean? For example %in% ?
Put quotes around it to find the help page. Either of these work
> help("%in%")
> ?"%in%"
Once you get to the help page, you'll see that
‘%in%’ is currently defined as
‘"%in%" <- function(x, table) match(x, table, nomatch = 0) > 0’
Since time
is a generic, I don't know what time(X2)
returns without knowing what X2
is. But, %in%
tells you which items from the left hand side are also in the right hand side.
> c(1:5) %in% c(3:8)
[1] FALSE FALSE TRUE TRUE TRUE
See also, intersect
> intersect(c(1:5), c(3:8))
[1] 3 4 5
What is the % -% operator in R?
It is a multiple assignment operator from zeallot
%<-% and %->% invisibly returnvalue.
These operators are used primarily for their assignment side-effect.
%<-% and %->% assign into the environment in which they are evaluated.
i.e. it creates multiple objects from a single line of code
> library(zeallot)
> c(x, y, z) %<-% c(1, 3, 5)
> x
[1] 1
> y
[1] 3
> z
[1] 5
Related Topics
How to Change the Number of Decimal Places on Axis Labels in Ggplot2
How to Load Data Quickly into R
How to Group a Vector into a List of Vectors
How to Remove + (Plus Sign) from String in R
How to Deal with Hdf5 Files in R
Undefined Columns Selected When Subsetting Data Frame
Remove Kernel on Jupyter Notebook
How to Find the Length of a String in R
How to Perform Pairwise Operation Like '%In%' and Set Operations for a List of Vectors
Save a Ggplot2 Time Series Plot Grob Generated by Ggplotgrob
Appending a List to a List of Lists in R
How to Add Elements to a List in R (Loop)
R:Pass Argument to Glm Inside an R Function
How to Subset a Matrix with Different Column Positions for Each Row
Dplyr - Groupby on Multiple Columns Using Variable Names
Modify Glm Function to Adopt User-Specified Link Function in R