What is the difference between [ ] and [[ ]] in R?
[] = always returns object of same class (out of basic object classes), can select more than one element of an object
[[]] = can extract one element from list or data frame, returned object (out of basic object classes) not necessarily list/dataframe
Is there a technical difference between = and -
Yes there is. This is what the help page of '='
says:
The operators <- and = assign into the
environment in which they are
evaluated. The operator <- can be used
anywhere, whereas the operator = is
only allowed at the top level (e.g.,
in the complete expression typed at
the command prompt) or as one of the
subexpressions in a braced list of
expressions.
With "can be used" the help file means assigning an object here. In a function call you can't assign an object with =
because =
means assigning arguments there.
Basically, if you use <-
then you assign a variable that you will be able to use in your current environment. For example, consider:
matrix(1,nrow=2)
This just makes a 2 row matrix. Now consider:
matrix(1,nrow<-2)
This also gives you a two row matrix, but now we also have an object called nrow
which evaluates to 2! What happened is that in the second use we didn't assign the argument nrow
2, we assigned an object nrow
2 and send that to the second argument of matrix
, which happens to be nrow.
Edit:
As for the edited questions. Both are the same. The use of =
or <-
can cause a lot of discussion as to which one is best. Many style guides advocate <-
and I agree with that, but do keep spaces around <-
assignments or they can become quite hard to interpret. If you don't use spaces (you should, except on twitter), I prefer =
, and never use ->
!
But really it doesn't matter what you use as long as you are consistent in your choice. Using =
on one line and <-
on the next results in very ugly code.
Difference between [] and $ operators for subsetting
Below we will use the one-row data frame in order to provide briefer output:
mtcars1 <- mtcars[1, ]
Note the differences among these. We can use class
as in class(mtcars["hp"])
to investigate the class of the return value.
The first two correspond to the code in the question and return a data frame and plain vector respectively. The key differences between [
and $
are that [
(1) can specify multiple columns, (2) allows passing of a variable as the index and (3) returns a data frame (although see examples later on) whereas $
(1) can only specify a single column, (2) the index must be hard coded and (3) it returns a vector.
mtcars1["hp"] # returns data frame
## hp
## Mazda RX4 110
mtcars1$hp # returns plain vector
## [1] 110
Other examples where index is a single element. Note that the first and second examples below are actually the same as drop = TRUE
is the default.
mtcars1[, "hp"] # returns plain vector
## [1] 110
mtcars1[, "hp", drop = TRUE] # returns plain vector
## [1] 110
mtcars1[, "hp", drop = FALSE] # returns data frame
## hp
## Mazda RX4 110
Also there is the [[
operator which is like the $
operator except it can accept a variable as the index whereas $
requires the index to be hard coded:
mtcars1[["hp"]] # returns plain vector
## [1] 110
Others where index specifies multiple elements. $
and [[
cannot be used with multiple elements so these examples only use [
:
mtcars1[c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp"), drop = FALSE] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp"), drop = TRUE] # returns list
## $mpg
## [1] 21
##
## $hp
## [1] 110
[
mtcars[foo]
can return more than one column if foo
is a vector with more than one element, e.g. mtcars[c("hp", "mpg")]
, and in all cases the return value is a data.frame even if foo
has only one element (as it does in the question).
There is also mtcars[, foo, drop = FALSE]
which returns the same value as mtcars[foo]
so it always returns a data frame. With drop = TRUE
it will return a list rather than a data.frame in the case that foo
specifies multiple columns and returns the column itself if it specifies a single column.
[[
On the other hand mtcars[[foo]]
only works if foo has one element and it returns that column, not a data frame.
$
mtcars$hp
also only works for a single column, like [[
, and returns the column, not a data frame containing that column.
mtcars$hp
is like mtcars[["hp"]]
; however, there is no possibility to pass a variable index with $
. One can only hard-code the index with $
.
subset
Note that this works:
subset(mtcars, hp > 150)
returning a data frame containing those rows where the hp
column exceeds 150
:
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
other objects
The above pertain to data frames but other objects that can use $
, [
and [[
will have their own rules. In particular if m
is a matrix, e.g. m <- as.matrix(BOD)
, then m[, 1]
is a vector, not a one column matrix, but m[, 1, drop = FALSE]
is a one column matrix. m[[1]]
and m[1]
are both the first element of m
, not the first column. m$a
does not work at all.
help
See ?Extract
for more information. Also ?"$"
, ?"["
and ?"[["
all get to the same page, as well.
The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
The R Language Definition is handy for answering these types of questions:
- http://cran.r-project.org/doc/manuals/R-lang.html#Indexing
R has three basic indexing operators, with syntax displayed by the following examples
x[i]
x[i, j]
x[[i]]
x[[i, j]]
x$a
x$"a"
For vectors and matrices the[[
forms are rarely used, although they have some slight semantic differences from the[
form (e.g. it drops any names or dimnames attribute, and that partial matching is used for character indices). When indexing multi-dimensional structures with a single index,x[[i]]
orx[i]
will return thei
th sequential element ofx
.
For lists, one generally uses[[
to select any single element, whereas[
returns a list of the selected elements.
The[[
form allows only a single element to be selected using integer or character indices, whereas[
allows indexing by vectors. Note though that for a list, the index can be a vector and each element of the vector is applied in turn to the list, the selected component, the selected component of that component, and so on. The result is still a single element.
What's the difference between `=` and ` -` in R?
From here:
The operators <- and = assign into the environment in which they are evaluated. The operator <- can be used anywhere, whereas the operator = is only allowed at the top level (e.g., in the complete expression typed at the command prompt) or as one of the subexpressions in a braced list of expressions.
In R programming, what's the difference between & vs &&, and | vs ||
they can only handle a single logical test on each side of the operator
a <- c(T, F, F, F)
b <- c(T, F, F, F)
a && b
Returns
[1] TRUE
Because only the first element of a
and b
are tested!
Edit:
Consider the following, where we 'rotate' a
and b
after each &&
test:
a <- c(T, F, T, F)
b <- c(T, F, F, T)
for (i in seq_along(a)){
cat(paste0("'a' is: ", paste0(a, collapse=", "), " and\n'b' is: ", paste0(b, collapse=", "),"\n"))
print(paste0("'a && b' is: ", a && b))
a <- c(a[2:length(a)], a[1])
b <- c(b[2:length(b)], b[i])
}
Gives us:
'a' is: TRUE, FALSE, TRUE, FALSE and
'b' is: TRUE, FALSE, FALSE, TRUE
[1] "'a && b' is: TRUE"
'a' is: FALSE, TRUE, FALSE, TRUE and
'b' is: FALSE, FALSE, TRUE, TRUE
[1] "'a && b' is: FALSE"
'a' is: TRUE, FALSE, TRUE, FALSE and
'b' is: FALSE, TRUE, TRUE, FALSE
[1] "'a && b' is: FALSE"
'a' is: FALSE, TRUE, FALSE, TRUE and
'b' is: TRUE, TRUE, FALSE, TRUE
[1] "'a && b' is: FALSE"
Additionally, &&
, ||
stops as soon as the expression is clear:
FALSE & a_not_existing_object
TRUE | a_not_existing_object
Returns:
Error: object 'a_not_existing_object' not found
Error: object 'a_not_existing_object' not found
But:
FALSE && a_not_existing_object
TRUE || a_not_existing_object
Returns:
[1] FALSE
[1] TRUE
Because anything after FALSE
AND something (and TRUE
OR something) becomes FALSE
and TRUE
respectively
This last behavior of &&
and ||
is especially useful if you want to check in your control-flow for an element that may not exist:
if (exists(a_not_existing_object) && a_not_existing_object > 42) {...}
This way the evaluation stops after the first expression evaluates to FALSE
and the a_not_existing_object > 42
part is not even atempted!
R difference between [[]] and []
all_data[1]=list(5,6)
gives you a Warning (not an error) that the lengths aren't the same. You can't set a one-element list to a two-element list. It's like trying x <- 1; x[1] <- 1:2
.
But you can set one element of a list to contain another list, which is why all_data[[1]]=list(5,6)
works.
Percent difference between two numbers in R
This really seems more of a question about how to find a percentage difference in general, which is something you can easily google. Nonetheless, to calculate a % difference:
((original value - new value) / original value) * 100.
So,
((combined_data2015_2019$Economy_GDPperCapita_2015 -
combined_data_2015_2019$Economy_GDPpercapita_2019) /
combined_data2015_2019$Economy_GDPperCapita_2015) * 100
((1.29025 - 1.34) / 1.29025) * 100 = ~3.85% change.
Hopefully this answers your question.
Related Topics
Ggplot2 Shade Area Under Density Curve by Group
Specifying Column Names in a Data.Frame Changes Spaces to "."
How to Get Ranks with No Gaps When There Are Ties Among Values
Cumulative Sum Until Maximum Reached, Then Repeat from Zero in the Next Row
R Package Lattice Won't Plot If Run Using Source()
Dplyr/R Cumulative Sum with Reset
Python's Xrange Alternative for R or How to Loop Over Large Dataset Lazilly
How to Create a "Macro" for Regressors in R
Cowplot Made Ggplot2 Theme Disappear/How to See Current Ggplot2 Theme, and Restore the Default
How to Use Functions in One R Package Masked by Another Package
Group Integer Vector into Consecutive Runs
R Function Not Returning Values
How to Generate Distributions Given, Mean, Sd, Skew and Kurtosis in R
Merge Data Frames Based on Rownames in R