What's the difference between [1], [1,], [,1], [[1]] for a dataframe in R?
In R, operators are not used for one data type only. Operators can be overloaded for whatever data type you like (e.g. also S3/S4 classes).
In fact, that's the case for data.frames.
as data.frames are lists, the
[i]
and[[i]]
(and$
) show list-like behaviour.row, colum indices do have an intuitive meaning for tables, and data.frames look like tables. Probably that is the reason why methods for data.frame [i, j] were defined.
You can even look at the definitions, they are coded in the S3 system (so methodname.class
):
> `[.data.frame`
and
> `[[.data.frame`
(the backticks quote the function name, otherwise R would try to use the operator and end up with a syntax error)
Are dataframe[ ,-1] and dataframe[-1] the same?
Almost.
[-1]
uses the fact that a data.frame is a list, so when you do dataframe[-1]
it returns another data.frame (list) without the first element (i.e. column).
[ ,-1]
uses the fact that a data.frame is a two dimensional array, so when you do dataframe[, -1]
you get the sub-array that does not include the first column.
A priori, they sound like the same, but the second case also tries by default to reduce the dimension of the subarray it returns. So depending on the dimensions of your dataframe
you may get a data.frame or a vector, see for example:
> data <- data.frame(a = 1:2, b = 3:4)
> class(data[-1])
[1] "data.frame"
> class(data[, -1])
[1] "integer"
You can use drop = FALSE
to override that behavior:
> class(data[, -1, drop = FALSE])
[1] "data.frame"
The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
The R Language Definition is handy for answering these types of questions:
- http://cran.r-project.org/doc/manuals/R-lang.html#Indexing
R has three basic indexing operators, with syntax displayed by the following examples
x[i]
x[i, j]
x[[i]]
x[[i, j]]
x$a
x$"a"
For vectors and matrices the[[
forms are rarely used, although they have some slight semantic differences from the[
form (e.g. it drops any names or dimnames attribute, and that partial matching is used for character indices). When indexing multi-dimensional structures with a single index,x[[i]]
orx[i]
will return thei
th sequential element ofx
.
For lists, one generally uses[[
to select any single element, whereas[
returns a list of the selected elements.
The[[
form allows only a single element to be selected using integer or character indices, whereas[
allows indexing by vectors. Note though that for a list, the index can be a vector and each element of the vector is applied in turn to the list, the selected component, the selected component of that component, and so on. The result is still a single element.
Difference between single and double bracket in calling columns
A data.frame
is a list
with columns of equal length. By using [[
, we extract the column as a vector
, while with [
, get a data.frame
with single or multiple columns. Another option to return a vector
with [
is to specify the ,
to indicate explicitly that it is a column index and by default then the drop = TRUE
gets triggered for data.frame
myDataset[, 1]
If we still want a data.frame single column
myDataset[, 1, drop = FALSE]
What is the difference between [ ] and [[ ]] in R?
[] = always returns object of same class (out of basic object classes), can select more than one element of an object
[[]] = can extract one element from list or data frame, returned object (out of basic object classes) not necessarily list/dataframe
What's the difference between `1L` and `1`?
So, @James and @Brian explained what 3L means. But why would you use it?
Most of the time it makes no difference - but sometimes you can use it to get your code to run faster and consume less memory. A double ("numeric") vector uses 8 bytes per element. An integer vector uses only 4 bytes per element. For large vectors, that's less wasted memory and less to wade through for the CPU (so it's typically faster).
Mostly this applies when working with indices.
Here's an example where adding 1 to an integer vector turns it into a double vector:
x <- 1:100
typeof(x) # integer
y <- x+1
typeof(y) # double, twice the memory size
object.size(y) # 840 bytes (on win64)
z <- x+1L
typeof(z) # still integer
object.size(z) # 440 bytes (on win64)
...but also note that working excessively with integers can be dangerous:
1e9L * 2L # Works fine; fast lean and mean!
1e9L * 4L # Ooops, overflow!
...and as @Gavin pointed out, the range for integers is roughly -2e9 to 2e9.
A caveat though is that this applies to the current R version (2.13). R might change this at some point (64-bit integers would be sweet, which could enable vectors of length > 2e9). To be safe, you should use .Machine$integer.max
whenever you need the maximum integer value (and negate that for the minimum).
Difference between [] and $ operators for subsetting
Below we will use the one-row data frame in order to provide briefer output:
mtcars1 <- mtcars[1, ]
Note the differences among these. We can use class
as in class(mtcars["hp"])
to investigate the class of the return value.
The first two correspond to the code in the question and return a data frame and plain vector respectively. The key differences between [
and $
are that [
(1) can specify multiple columns, (2) allows passing of a variable as the index and (3) returns a data frame (although see examples later on) whereas $
(1) can only specify a single column, (2) the index must be hard coded and (3) it returns a vector.
mtcars1["hp"] # returns data frame
## hp
## Mazda RX4 110
mtcars1$hp # returns plain vector
## [1] 110
Other examples where index is a single element. Note that the first and second examples below are actually the same as drop = TRUE
is the default.
mtcars1[, "hp"] # returns plain vector
## [1] 110
mtcars1[, "hp", drop = TRUE] # returns plain vector
## [1] 110
mtcars1[, "hp", drop = FALSE] # returns data frame
## hp
## Mazda RX4 110
Also there is the [[
operator which is like the $
operator except it can accept a variable as the index whereas $
requires the index to be hard coded:
mtcars1[["hp"]] # returns plain vector
## [1] 110
Others where index specifies multiple elements. $
and [[
cannot be used with multiple elements so these examples only use [
:
mtcars1[c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp"), drop = FALSE] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp"), drop = TRUE] # returns list
## $mpg
## [1] 21
##
## $hp
## [1] 110
[
mtcars[foo]
can return more than one column if foo
is a vector with more than one element, e.g. mtcars[c("hp", "mpg")]
, and in all cases the return value is a data.frame even if foo
has only one element (as it does in the question).
There is also mtcars[, foo, drop = FALSE]
which returns the same value as mtcars[foo]
so it always returns a data frame. With drop = TRUE
it will return a list rather than a data.frame in the case that foo
specifies multiple columns and returns the column itself if it specifies a single column.
[[
On the other hand mtcars[[foo]]
only works if foo has one element and it returns that column, not a data frame.
$
mtcars$hp
also only works for a single column, like [[
, and returns the column, not a data frame containing that column.
mtcars$hp
is like mtcars[["hp"]]
; however, there is no possibility to pass a variable index with $
. One can only hard-code the index with $
.
subset
Note that this works:
subset(mtcars, hp > 150)
returning a data frame containing those rows where the hp
column exceeds 150
:
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
other objects
The above pertain to data frames but other objects that can use $
, [
and [[
will have their own rules. In particular if m
is a matrix, e.g. m <- as.matrix(BOD)
, then m[, 1]
is a vector, not a one column matrix, but m[, 1, drop = FALSE]
is a one column matrix. m[[1]]
and m[1]
are both the first element of m
, not the first column. m$a
does not work at all.
help
See ?Extract
for more information. Also ?"$"
, ?"["
and ?"[["
all get to the same page, as well.
Related Topics
How to Change Color Scheme in Corrplot
Arrange Within a Group with Dplyr
What's a Prettier Way to Print Info with R
Using Sample() with Sample Space Size = 1
Collapse Vector to String of Characters with Respective Numbers of Consequtive Occurences
Trouble with Strings with <U+0092> Unicode Characters
R Plotly: Cannot Re-Arrange X-Axis When Axis Type Is Category
Make a Boxplot Without Whiskers
Get Tick Break Positions in Ggplot
Fast Alternative to Split in R
Verify Object Existence Inside a Function in R
Find Max Per Group and Return Another Column
Find If Each Row of a Logical Matrix Has at Least One True
Combine Two Lists of Dataframes, Dataframe by Dataframe
R: Why Does Strptime Always Return Na When I Try to Format a Date String