The difference between bracket [ ] and double bracket [[ ]] for accessing the elements of a list or dataframe
The R Language Definition is handy for answering these types of questions:
- http://cran.r-project.org/doc/manuals/R-lang.html#Indexing
R has three basic indexing operators, with syntax displayed by the following examples
x[i]
x[i, j]
x[[i]]
x[[i, j]]
x$a
x$"a"
For vectors and matrices the[[
forms are rarely used, although they have some slight semantic differences from the[
form (e.g. it drops any names or dimnames attribute, and that partial matching is used for character indices). When indexing multi-dimensional structures with a single index,x[[i]]
orx[i]
will return thei
th sequential element ofx
.
For lists, one generally uses[[
to select any single element, whereas[
returns a list of the selected elements.
The[[
form allows only a single element to be selected using integer or character indices, whereas[
allows indexing by vectors. Note though that for a list, the index can be a vector and each element of the vector is applied in turn to the list, the selected component, the selected component of that component, and so on. The result is still a single element.
Difference between single and double bracket in calling columns
A data.frame
is a list
with columns of equal length. By using [[
, we extract the column as a vector
, while with [
, get a data.frame
with single or multiple columns. Another option to return a vector
with [
is to specify the ,
to indicate explicitly that it is a column index and by default then the drop = TRUE
gets triggered for data.frame
myDataset[, 1]
If we still want a data.frame single column
myDataset[, 1, drop = FALSE]
What is the difference between [ ] and [[ ]] in R?
[] = always returns object of same class (out of basic object classes), can select more than one element of an object
[[]] = can extract one element from list or data frame, returned object (out of basic object classes) not necessarily list/dataframe
single vs double square brackets in python
The list inside a list is called a nested list. In the following list my_movies_1
, you have length 1 for my_movies_1
and the length of the inner list is 9. This inner list is accessed using my_movies_1[0]
.
my_movies_1 = [['How I Met your Mother', 'Friends', 'sillicon valley','The Wire','breakin bad', 'Family Guy','Game of Throne','South park', 'Rick and Morty']]
On the other hand, the following list is not a nested list and has a length of 9
my_movies_2 = ['How I Met your Mother', 'Friends', 'sillicon valley','The Wire','breakin bad','Family Guy','Game of Throne','South park', 'Rick and Morty']
How are they related:
Here my_movies_1[0]
would give you my_movies_2
The difference between double brace `[[...]]` and single brace `[..]` indexing in Pandas
Consider this:
Source DF:
In [79]: df
Out[79]:
Brains Bodies
0 42 34
1 32 23
Selecting one column - results in Pandas.Series:
In [80]: df['Brains']
Out[80]:
0 42
1 32
Name: Brains, dtype: int64
In [81]: type(df['Brains'])
Out[81]: pandas.core.series.Series
Selecting subset of DataFrame - results in DataFrame:
In [82]: df[['Brains']]
Out[82]:
Brains
0 42
1 32
In [83]: type(df[['Brains']])
Out[83]: pandas.core.frame.DataFrame
Conclusion: the second approach allows us to select multiple columns from the DataFrame. The first one just for selecting single column...
Demo:
In [84]: df = pd.DataFrame(np.random.rand(5,6), columns=list('abcdef'))
In [85]: df
Out[85]:
a b c d e f
0 0.065196 0.257422 0.273534 0.831993 0.487693 0.660252
1 0.641677 0.462979 0.207757 0.597599 0.117029 0.429324
2 0.345314 0.053551 0.634602 0.143417 0.946373 0.770590
3 0.860276 0.223166 0.001615 0.212880 0.907163 0.437295
4 0.670969 0.218909 0.382810 0.275696 0.012626 0.347549
In [86]: df[['e','a','c']]
Out[86]:
e a c
0 0.487693 0.065196 0.273534
1 0.117029 0.641677 0.207757
2 0.946373 0.345314 0.634602
3 0.907163 0.860276 0.001615
4 0.012626 0.670969 0.382810
and if we specify only one column in the list we will get a DataFrame with one column:
In [87]: df[['e']]
Out[87]:
e
0 0.487693
1 0.117029
2 0.946373
3 0.907163
4 0.012626
R difference between [[]] and []
all_data[1]=list(5,6)
gives you a Warning (not an error) that the lengths aren't the same. You can't set a one-element list to a two-element list. It's like trying x <- 1; x[1] <- 1:2
.
But you can set one element of a list to contain another list, which is why all_data[[1]]=list(5,6)
works.
Difference between [] and $ operators for subsetting
Below we will use the one-row data frame in order to provide briefer output:
mtcars1 <- mtcars[1, ]
Note the differences among these. We can use class
as in class(mtcars["hp"])
to investigate the class of the return value.
The first two correspond to the code in the question and return a data frame and plain vector respectively. The key differences between [
and $
are that [
(1) can specify multiple columns, (2) allows passing of a variable as the index and (3) returns a data frame (although see examples later on) whereas $
(1) can only specify a single column, (2) the index must be hard coded and (3) it returns a vector.
mtcars1["hp"] # returns data frame
## hp
## Mazda RX4 110
mtcars1$hp # returns plain vector
## [1] 110
Other examples where index is a single element. Note that the first and second examples below are actually the same as drop = TRUE
is the default.
mtcars1[, "hp"] # returns plain vector
## [1] 110
mtcars1[, "hp", drop = TRUE] # returns plain vector
## [1] 110
mtcars1[, "hp", drop = FALSE] # returns data frame
## hp
## Mazda RX4 110
Also there is the [[
operator which is like the $
operator except it can accept a variable as the index whereas $
requires the index to be hard coded:
mtcars1[["hp"]] # returns plain vector
## [1] 110
Others where index specifies multiple elements. $
and [[
cannot be used with multiple elements so these examples only use [
:
mtcars1[c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp")] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp"), drop = FALSE] # returns data frame
## mpg hp
## Mazda RX4 21 110
mtcars1[, c("mpg", "hp"), drop = TRUE] # returns list
## $mpg
## [1] 21
##
## $hp
## [1] 110
[
mtcars[foo]
can return more than one column if foo
is a vector with more than one element, e.g. mtcars[c("hp", "mpg")]
, and in all cases the return value is a data.frame even if foo
has only one element (as it does in the question).
There is also mtcars[, foo, drop = FALSE]
which returns the same value as mtcars[foo]
so it always returns a data frame. With drop = TRUE
it will return a list rather than a data.frame in the case that foo
specifies multiple columns and returns the column itself if it specifies a single column.
[[
On the other hand mtcars[[foo]]
only works if foo has one element and it returns that column, not a data frame.
$
mtcars$hp
also only works for a single column, like [[
, and returns the column, not a data frame containing that column.
mtcars$hp
is like mtcars[["hp"]]
; however, there is no possibility to pass a variable index with $
. One can only hard-code the index with $
.
subset
Note that this works:
subset(mtcars, hp > 150)
returning a data frame containing those rows where the hp
column exceeds 150
:
mpg cyl disp hp drat wt qsec vs am gear carb
Hornet Sportabout 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
Duster 360 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
Merc 450SE 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4
Camaro Z28 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
Ford Pantera L 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8
other objects
The above pertain to data frames but other objects that can use $
, [
and [[
will have their own rules. In particular if m
is a matrix, e.g. m <- as.matrix(BOD)
, then m[, 1]
is a vector, not a one column matrix, but m[, 1, drop = FALSE]
is a one column matrix. m[[1]]
and m[1]
are both the first element of m
, not the first column. m$a
does not work at all.
help
See ?Extract
for more information. Also ?"$"
, ?"["
and ?"[["
all get to the same page, as well.
Related Topics
How to Remove Rows With Any Zero Value
How to Add a Suffix (Or Prefix) Elements of an Existing List
Convert Categorical Variables to Numeric in R
Splitting a Large Data Frame into Smaller Segments
Change Rows into Columns in R With Values Yes/No (1/0)
Converting Year and Month ("Yyyy-Mm" Format) to a Date
Is R'S Apply Family More Than Syntactic Sugar
Does Ifelse Really Calculate Both of Its Vectors Every Time? Is It Slow
How to Read Multiple (Excel) Files into R
How to Name Variables on the Fly
Delete Rows That Exist in Another Data Frame
Calculate Difference Between Values in Consecutive Rows by Group
Find Duplicated Elements With Dplyr
R: How to Check If All Columns in a Data.Frame Are the Same
Count Number of Rows Per Group and Add Result to Original Data Frame
Filter Data.Frame Rows by a Logical Condition
How to Sort a Character Vector Where Elements Contain Letters and Numbers