How to Select Columns Programmatically in a Data.Table

How to select columns programmatically in a data.table?

This is covered in FAQ 1.1, 1.2 and 2.17.

Some possibilities:

DT[, keep, with = FALSE]
DT[, c('V1', 'V3'), with = FALSE]
DT[, c(1, 3), with = FALSE]
DT[, list(V1, V3)]

The reason DF[c('V1','V3')] works as it does for a data.frame is covered in ?`[.data.frame`

Data frames can be indexed in several modes. When [ and [[ are used
with a single vector index (x[i] or x[[i]]), they index the data frame
as if it were a list. In this usage a drop argument is ignored, with a
warning.


From data.table 1.10.2, you may use the .. prefix when subsetting columns programmatically:

When j is a symbol prefixed with .. it will be looked up in calling scope and its value taken to be column names or numbers [...] It is experimental.

Thus:

DT[ , ..keep]
# V1 V3
# 1: 1 7
# 2: 2 8
# 3: 3 9

Select programmatically column names in data.table

It is

data[, col, with = FALSE]
# ^^^^

not which. Also data[, c("test", col), with = FALSE] would work, of course.


The former successfully yields

    X201804_QTY
1: 11
2: 12
3: 13
4: 14
5: 15
6: 16
7: 17
8: 18
9: 19
10: 20

How to select data.table columns whose name is variable

Add in the , with = FALSE

dt <- data.table(x = 1:10, y = 11:20, z = 1:10)
col <- "x"
dt[, c(col, "y"), with=FALSE]

Select subset of columns in data.table R

Use with=FALSE:

cols = paste("V", c(1,2,3,5), sep="")

dt[, !cols, with=FALSE]

I suggest going through the "Introduction to data.table" vignette.


Update: From v1.10.2 onwards, you can also do:

dt[, ..cols]

See the first NEWS item under v1.10.2 here for additional explanation.

How to select columns in a dataframe, and skip over columns that don't exist - using data.table syntax

You can use intersect to keep the columns which are available in data.table.

library(data.table)
dt <- data.table(a = 1:5, b = 2:6)

select_cols <- function(dt, cols) {
cols <- intersect(names(dt), cols)
dt[, ..cols]
}

select_cols(dt, c('a', 'b'))

# a b
#1: 1 2
#2: 2 3
#3: 3 4
#4: 4 5
#5: 5 6

select_cols(dt, c('a', 'c'))
# a
#1: 1
#2: 2
#3: 3
#4: 4
#5: 5

Select rows from Data.table programmatically based on column criteria

An option is to eval by pasteing the columns of 'DF' to create an expression

DT[eval(parse(text= paste(DF$X1, DF$X2,  sep="==", collapse=" & ")))]
# x y v
#1: a 3 2

or we can specify the .SDcols as the 'X1' column, then compare the .SD with 'X2' and Reduce it to a logical vector with &, subset the rows

DT[DT[, Reduce(`&`, Map(`==`, .SD, DF$X2)),.SDcols = as.character(DF$X1)]]
# x y v
#1: a 3 2

Select multiple columns in data.table by their numeric indices

For versions of data.table >= 1.9.8, the following all just work:

library(data.table)
dt <- data.table(a = 1, b = 2, c = 3)

# select single column by index
dt[, 2]
# b
# 1: 2

# select multiple columns by index
dt[, 2:3]
# b c
# 1: 2 3

# select single column by name
dt[, "a"]
# a
# 1: 1

# select multiple columns by name
dt[, c("a", "b")]
# a b
# 1: 1 2

For versions of data.table < 1.9.8 (for which numerical column selection required the use of with = FALSE), see this previous version of this answer. See also NEWS on v1.9.8, POTENTIALLY BREAKING CHANGES, point 3.

Selecting columns from dataframe programmatically when column names have spaces

You have to do nothing differently for values with spaces. For example,

library(dplyr)
library(rlang)

col_names <- c("cyl","mpg","New Var")
cc <- quos(col_names)
mtcars %>% mutate(`New Var`=1) %>% select(!!!cc)

Also note, that select also accepts string names so this works too :

mtcars%>% mutate(`New Var`=1) %>% select(col_names)

R data.table syntax to create and select on the fly

try c(list(d=d*-1), .SD) in j argument

j expects a list

.SD is a list

So when adding new column like this you just need to put it into list and combine with c function.

Programmatically assigning columns in data.table with dynamic column names

You're probably looking for this (note the parentheses):

dt = data.table(a = 1:5)
newcol = 'b'
dt[, (newcol) := c(NA, diff(a))]
dt
# a b
#1: 1 NA
#2: 2 1
#3: 3 1
#4: 4 1
#5: 5 1

Or maybe this:

oldcol = 'a'
dt[, (newcol) := c(NA, diff(get(oldcol)))]


Related Topics



Leave a reply



Submit