What's The Difference Between [1], [1,], [,1], [[1]] for a Dataframe in R

What's the difference between [1], [1,], [,1], [[1]] for a dataframe in R?

In R, operators are not used for one data type only. Operators can be overloaded for whatever data type you like (e.g. also S3/S4 classes).

In fact, that's the case for data.frames.

  • as data.frames are lists, the [i] and [[i]] (and $) show list-like behaviour.

  • row, colum indices do have an intuitive meaning for tables, and data.frames look like tables. Probably that is the reason why methods for data.frame [i, j] were defined.

You can even look at the definitions, they are coded in the S3 system (so methodname.class):

> `[.data.frame`

and

> `[[.data.frame`

(the backticks quote the function name, otherwise R would try to use the operator and end up with a syntax error)

Are dataframe[ ,-1] and dataframe[-1] the same?

Almost.

[-1] uses the fact that a data.frame is a list, so when you do dataframe[-1] it returns another data.frame (list) without the first element (i.e. column).

[ ,-1]uses the fact that a data.frame is a two dimensional array, so when you do dataframe[, -1] you get the sub-array that does not include the first column.

A priori, they sound like the same, but the second case also tries by default to reduce the dimension of the subarray it returns. So depending on the dimensions of your dataframe you may get a data.frame or a vector, see for example:

> data <- data.frame(a = 1:2, b = 3:4)
> class(data[-1])
[1] "data.frame"
> class(data[, -1])
[1] "integer"

You can use drop = FALSE to override that behavior:

> class(data[, -1, drop = FALSE])
[1] "data.frame"

The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe

The R Language Definition is handy for answering these types of questions:

  • http://cran.r-project.org/doc/manuals/R-lang.html#Indexing


R has three basic indexing operators, with syntax displayed by the following examples



x[i]
x[i, j]
x[[i]]
x[[i, j]]
x$a
x$"a"


For vectors and matrices the [[ forms are rarely used, although they have some slight semantic differences from the [ form (e.g. it drops any names or dimnames attribute, and that partial matching is used for character indices). When indexing multi-dimensional structures with a single index, x[[i]] or x[i] will return the ith sequential element of x.


For lists, one generally uses [[ to select any single element, whereas [ returns a list of the selected elements.


The [[ form allows only a single element to be selected using integer or character indices, whereas [ allows indexing by vectors. Note though that for a list, the index can be a vector and each element of the vector is applied in turn to the list, the selected component, the selected component of that component, and so on. The result is still a single element.

Difference between single and double bracket in calling columns

A data.frame is a list with columns of equal length. By using [[, we extract the column as a vector, while with [, get a data.frame with single or multiple columns. Another option to return a vector with [ is to specify the , to indicate explicitly that it is a column index and by default then the drop = TRUE gets triggered for data.frame

myDataset[, 1]

If we still want a data.frame single column

myDataset[, 1, drop = FALSE]


What is the difference between [ ] and [[ ]] in R?

[] = always returns object of same class (out of basic object classes), can select more than one element of an object

[[]] = can extract one element from list or data frame, returned object (out of basic object classes) not necessarily list/dataframe

What's the difference between `1L` and `1`?

So, @James and @Brian explained what 3L means. But why would you use it?

Most of the time it makes no difference - but sometimes you can use it to get your code to run faster and consume less memory. A double ("numeric") vector uses 8 bytes per element. An integer vector uses only 4 bytes per element. For large vectors, that's less wasted memory and less to wade through for the CPU (so it's typically faster).

Mostly this applies when working with indices.
Here's an example where adding 1 to an integer vector turns it into a double vector:

x <- 1:100
typeof(x) # integer

y <- x+1
typeof(y) # double, twice the memory size
object.size(y) # 840 bytes (on win64)

z <- x+1L
typeof(z) # still integer
object.size(z) # 440 bytes (on win64)

...but also note that working excessively with integers can be dangerous:

1e9L * 2L # Works fine; fast lean and mean!
1e9L * 4L # Ooops, overflow!

...and as @Gavin pointed out, the range for integers is roughly -2e9 to 2e9.

A caveat though is that this applies to the current R version (2.13). R might change this at some point (64-bit integers would be sweet, which could enable vectors of length > 2e9). To be safe, you should use .Machine$integer.max whenever you need the maximum integer value (and negate that for the minimum).

Difference between [] and $ operators for subsetting

Below we will use the one-row data frame in order to provide briefer output:

mtcars1 <- mtcars[1, ]

Note the differences among these. We can use class as in class(mtcars["hp"]) to investigate the class of the return value.

The first two correspond to the code in the question and return a data frame and plain vector respectively. The key differences between [ and $ are that [ (1) can specify multiple columns, (2) allows passing of a variable as the index and (3) returns a data frame (although see examples later on) whereas $ (1) can only specify a single column, (2) the index must be hard coded and (3) it returns a vector.

mtcars1["hp"]  # returns data frame
## hp
## Mazda RX4 110

mtcars1$hp # returns plain vector
## [1] 110

Other examples where index is a single element. Note that the first and second examples below are actually the same as drop = TRUE is the default.

mtcars1[, "hp"] # returns plain vector
## [1] 110

mtcars1[, "hp", drop = TRUE] # returns plain vector
## [1] 110

mtcars1[, "hp", drop = FALSE] # returns data frame
## hp
## Mazda RX4 110

Also there is the [[ operator which is like the $ operator except it can accept a variable as the index whereas $ requires the index to be hard coded:

mtcars1[["hp"]] # returns plain vector
## [1] 110

Others where index specifies multiple elements. $ and [[ cannot be used with multiple elements so these examples only use [:

mtcars1[c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110

mtcars1[, c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110

mtcars1[, c("mpg", "hp"), drop = FALSE] # returns data frame
## mpg hp
## Mazda RX4 21 110

mtcars1[, c("mpg", "hp"), drop = TRUE] # returns list
## $mpg
## [1] 21
##
## $hp
## [1] 110

[

mtcars[foo] can return more than one column if foo is a vector with more than one element, e.g. mtcars[c("hp", "mpg")], and in all cases the return value is a data.frame even if foo has only one element (as it does in the question).

There is also mtcars[, foo, drop = FALSE] which returns the same value as mtcars[foo] so it always returns a data frame. With drop = TRUE it will return a list rather than a data.frame in the case that foo specifies multiple columns and returns the column itself if it specifies a single column.

[[

On the other hand mtcars[[foo]] only works if foo has one element and it returns that column, not a data frame.

$

mtcars$hp also only works for a single column, like [[, and returns the column, not a data frame containing that column.

mtcars$hp is like mtcars[["hp"]]; however, there is no possibility to pass a variable index with $. One can only hard-code the index with $.

subset

Note that this works:

subset(mtcars, hp > 150)

returning a data frame containing those rows where the hp column exceeds 150:

                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8

other objects

The above pertain to data frames but other objects that can use $, [ and [[ will have their own rules. In particular if m is a matrix, e.g. m <- as.matrix(BOD), then m[, 1] is a vector, not a one column matrix, but m[, 1, drop = FALSE] is a one column matrix. m[[1]] and m[1] are both the first element of m, not the first column. m$a does not work at all.

help

See ?Extract for more information. Also ?"$", ?"[" and ?"[[" all get to the same page, as well.



Related Topics



Leave a reply



Submit