Select subset of columns in data.table R
Use with=FALSE
:
cols = paste("V", c(1,2,3,5), sep="")
dt[, !cols, with=FALSE]
I suggest going through the "Introduction to data.table" vignette.
Update: From v1.10.2
onwards, you can also do:
dt[, ..cols]
See the first NEWS item under v1.10.2 here for additional explanation.
Selecting a subset of columns in a data.table
Use a very similar syntax as for a data.frame
, but add the argument with=FALSE
:
dt[, setdiff(colnames(dt),"V9"), with=FALSE]
V1 V2 V3 V4 V5 V6 V7 V8 V10
1: 1 1 1 1 1 1 1 1 1
2: 0 0 0 0 0 0 0 0 0
3: 1 1 1 1 1 1 1 1 1
4: 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0
6: 1 1 1 1 1 1 1 1 1
The use of with=FALSE
is nicely explained in the documentation for the j
argument in ?data.table
:
j: A single column name, single expresson of column names, list()
of expressions of column names, an expression or function call that evaluates to list (including data.frame
and data.table
which are lists, too), or (when with=FALSE
) same as j in [.data.frame
.
From v1.10.2 onwards it is also possible to do this as follows:
keep <- setdiff(names(dt), "V9")
dt[, ..keep]
Prefixing a symbol with ..
will look up in calling scope (i.e. the Global Environment) and its value taken to be column names or numbers (source).
subsetting columns in a datatable
We need to use with = FALSE
dt[, 1:2, with = FALSE]
This is explained in the ?data.table
with: By default with=TRUE and j is evaluated within the frame of x;
column names can be used as variables.When with=FALSE j is a character vector of column names, a numeric
vector of column positions to select or of the form startcol:endcol,
and the value returned is always a data.table. with=FALSE is often
useful in data.table to select columns dynamically
Selecting a subset of columns in R data.table using a vector with a constant column
Combine .SDcols
with c(.SD, .(new=10))
:
dt <- data.table(one = 1:3, two = 2:4, three = 3:5)
parlist <- c("one", "three")
dt[, c(.SD, .(new = 10)), .SDcols = parlist]
# one three new
# <int> <int> <num>
# 1: 1 3 10
# 2: 2 4 10
# 3: 3 5 10
using %in% to subset a data.table
The expression
DT[x==a | x==b]
returns all rows in DT
where the values in x
and a
are equal or x
and b
are equal. This is the desired result.
On the other hand
DT[x%in%c(a,b)]
returns all rows where x
matches any value in c(a, b)
, not just the corresponding value. Thus your second row appears because x == 3
and 3
appears (somewhere) in a
.
Subsetting multiple columns of a data.table with the same column name
You can pass a logical vector to select columns.
library(data.table)
dt[, names(dt) == 'a', with = FALSE]
# a a
#1: 1 7
#2: 2 8
#3: 3 9
How to select columns programmatically in a data.table?
This is covered in FAQ 1.1, 1.2 and 2.17.
Some possibilities:
DT[, keep, with = FALSE]
DT[, c('V1', 'V3'), with = FALSE]
DT[, c(1, 3), with = FALSE]
DT[, list(V1, V3)]
The reason DF[c('V1','V3')]
works as it does for a data.frame
is covered in ?`[.data.frame`
Data frames can be indexed in several modes. When
[
and[[
are used
with a single vector index (x[i]
orx[[i]]
), they index the data frame
as if it were a list. In this usage adrop
argument is ignored, with a
warning.
From data.table 1.10.2
, you may use the ..
prefix when subsetting columns programmatically:
When
j
is a symbol prefixed with..
it will be looked up in calling scope and its value taken to be column names or numbers [...] It is experimental.
Thus:
DT[ , ..keep]
# V1 V3
# 1: 1 7
# 2: 2 8
# 3: 3 9
How to select columns in data.table using a character vector of certain column names?
We can use ..
notation to find myVector
as a vector of column positions, like it would work in data.frame
mtcarsDT[, ..myVector]
According to ?data.table
In case of overlapping variables names inside dataset and in parent scope you can use double dot prefix
..cols
to explicitly refer to 'cols variable parent scope and not from your dataset.
r data.table row subset with column name as a variable
I guess you are looking for get
:
library(data.table)
DT <- data.table(x1=1:11, x2=11:21)
var <- "x1"
DT[get(var)==1,]
R data.table struggling with conditional subsetting when column name is predefined elsewhere
I can imagine this was very frustrating for you. I applaud the number of things you tried before posting. Here's one approach:
DT[get(column_name) == 1,]
x y
1: 1 0
2: 1 1
If you need to use column_name
in J
, you can use get(..column_name)
:
DT[,get(..column_name)]
[1] 1 1 0 0
The ..
instructs evaluation to occur in the parent environment.
Another approach for using a string in either I
or J
is with eval(as.name(column_name))
:
DT[eval(as.name(column_name)) == 1]
x y
1: 1 0
2: 1 1
DT[,eval(as.name(column_name))]
[1] 1 1 0 0
Related Topics
Turning Off Some Legends in a Ggplot
Standard Evaluation in Dplyr: Summarise a Variable Given as a Character String
Scale a Series Between Two Points
Calculate the Mean For Each Column of a Matrix in R
Dummify Character Column and Find Unique Values
Scatterplot With Marginal Histograms in Ggplot2
Levels≪-'( What Sorcery Is This
Combine Two or More Columns in a Dataframe into a New Column With a New Name
Dplyr::Select Function Clashes With Mass::Select
Replace Multiple Letters With Accents With Gsub
Concatenate Strings by Group With Dplyr
Creating a Comma Separated Vector
Can Dplyr Summarise Over Several Variables Without Listing Each One
Change the Spacing of Tick Marks on the Axis of a Plot
How to Load Packages in R Automatically