Selecting a Subset of Columns in a Data.Table

Select subset of columns in data.table R

Use with=FALSE:

cols = paste("V", c(1,2,3,5), sep="")

dt[, !cols, with=FALSE]

I suggest going through the "Introduction to data.table" vignette.


Update: From v1.10.2 onwards, you can also do:

dt[, ..cols]

See the first NEWS item under v1.10.2 here for additional explanation.

Selecting a subset of columns in a data.table

Use a very similar syntax as for a data.frame, but add the argument with=FALSE:

dt[, setdiff(colnames(dt),"V9"), with=FALSE]
V1 V2 V3 V4 V5 V6 V7 V8 V10
1: 1 1 1 1 1 1 1 1 1
2: 0 0 0 0 0 0 0 0 0
3: 1 1 1 1 1 1 1 1 1
4: 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0
6: 1 1 1 1 1 1 1 1 1

The use of with=FALSE is nicely explained in the documentation for the j argument in ?data.table:

j: A single column name, single expresson of column names, list() of expressions of column names, an expression or function call that evaluates to list (including data.frame and data.table which are lists, too), or (when with=FALSE) same as j in [.data.frame.


From v1.10.2 onwards it is also possible to do this as follows:

keep <- setdiff(names(dt), "V9")
dt[, ..keep]

Prefixing a symbol with .. will look up in calling scope (i.e. the Global Environment) and its value taken to be column names or numbers (source).

Subsetting multiple columns of a data.table with the same column name

You can pass a logical vector to select columns.

library(data.table)
dt[, names(dt) == 'a', with = FALSE]

# a a
#1: 1 7
#2: 2 8
#3: 3 9

subsetting columns in a datatable

We need to use with = FALSE

dt[, 1:2, with = FALSE]

This is explained in the ?data.table

with: By default with=TRUE and j is evaluated within the frame of x;
column names can be used as variables.

When with=FALSE j is a character vector of column names, a numeric
vector of column positions to select or of the form startcol:endcol,
and the value returned is always a data.table. with=FALSE is often
useful in data.table to select columns dynamically

Selecting a subset of columns in R data.table using a vector with a constant column

Combine .SDcols with c(.SD, .(new=10)):

dt <- data.table(one = 1:3, two = 2:4, three = 3:5)
parlist <- c("one", "three")

dt[, c(.SD, .(new = 10)), .SDcols = parlist]
# one three new
# <int> <int> <num>
# 1: 1 3 10
# 2: 2 4 10
# 3: 3 5 10

Efficient way to subset data.table based on value in any of selected columns

One option is to specify the 'cols' of interest in .SDcols, loop through the Subset of Data.table (.SD), generate a list of logical vectors, Reduce it to single logical vector with (|) and use that to subset the rows

i1 <- dt[, Reduce(`|`, lapply(.SD, `==`, 10)), .SDcols = cols]
test2 <- dt[i1]
identical(test1, test2)
#[1] TRUE

How to select columns programmatically in a data.table?

This is covered in FAQ 1.1, 1.2 and 2.17.

Some possibilities:

DT[, keep, with = FALSE]
DT[, c('V1', 'V3'), with = FALSE]
DT[, c(1, 3), with = FALSE]
DT[, list(V1, V3)]

The reason DF[c('V1','V3')] works as it does for a data.frame is covered in ?`[.data.frame`

Data frames can be indexed in several modes. When [ and [[ are used
with a single vector index (x[i] or x[[i]]), they index the data frame
as if it were a list. In this usage a drop argument is ignored, with a
warning.


From data.table 1.10.2, you may use the .. prefix when subsetting columns programmatically:

When j is a symbol prefixed with .. it will be looked up in calling scope and its value taken to be column names or numbers [...] It is experimental.

Thus:

DT[ , ..keep]
# V1 V3
# 1: 1 7
# 2: 2 8
# 3: 3 9

using %in% to subset a data.table

The expression

 DT[x==a | x==b]

returns all rows in DT where the values in x and a are equal or x and b are equal. This is the desired result.

On the other hand

 DT[x%in%c(a,b)]

returns all rows where x matches any value in c(a, b), not just the corresponding value. Thus your second row appears because x == 3 and 3 appears (somewhere) in a.

Subset a data.table based on column value where column is selected based on sequence

You can do this :

DT[DT[[1]] %in% letters[1:2]]


Related Topics



Leave a reply



Submit