Apply a function to a subset of data.table columns, by column-indices instead of name
The idiomatic approach is to use .SD
and .SDcols
You can force the RHS to be evaluated in the parent frame by wrapping in ()
a[, (b) := lapply(.SD, as.numeric), .SDcols = b]
For columns 2:3
a[, 2:3 := lapply(.SD, as.numeric), .SDcols = 2:3]
or
mysubset <- 2:3
a[, (mysubset) := lapply(.SD, as.numeric), .SDcols = mysubset]
data.table: transforming subset of columns with a function, row by row
If what you need is really to scale by row, you can try doing it in 2 steps:
# compute mean/sd:
mean_sd <- DT[, .(mean(unlist(.SD)), sd(unlist(.SD))), by=1:nrow(DT), .SDcols=grep("keyword",colnames(DT))]
# scale
DT[, grep("keyword",colnames(DT), value=TRUE) := lapply(.SD, function(x) (x-mean_sd$V1)/mean_sd$V2), .SDcols=grep("keyword",colnames(DT))]
Select subset of columns in data.table R
Use with=FALSE
:
cols = paste("V", c(1,2,3,5), sep="")
dt[, !cols, with=FALSE]
I suggest going through the "Introduction to data.table" vignette.
Update: From v1.10.2
onwards, you can also do:
dt[, ..cols]
See the first NEWS item under v1.10.2 here for additional explanation.
data.table in r : subset using column index
We can get the row index with .I
and use that to subset the DT
DT[DT[, .I[.SD==2], .SDcols = 1]]
# A B C
#1: 2 3 4
data
DT <- data.table(A = 1:5, B = 2:6, C = 3:7)
Selecting a subset of columns in a data.table
Use a very similar syntax as for a data.frame
, but add the argument with=FALSE
:
dt[, setdiff(colnames(dt),"V9"), with=FALSE]
V1 V2 V3 V4 V5 V6 V7 V8 V10
1: 1 1 1 1 1 1 1 1 1
2: 0 0 0 0 0 0 0 0 0
3: 1 1 1 1 1 1 1 1 1
4: 0 0 0 0 0 0 0 0 0
5: 0 0 0 0 0 0 0 0 0
6: 1 1 1 1 1 1 1 1 1
The use of with=FALSE
is nicely explained in the documentation for the j
argument in ?data.table
:
j: A single column name, single expresson of column names, list()
of expressions of column names, an expression or function call that evaluates to list (including data.frame
and data.table
which are lists, too), or (when with=FALSE
) same as j in [.data.frame
.
From v1.10.2 onwards it is also possible to do this as follows:
keep <- setdiff(names(dt), "V9")
dt[, ..keep]
Prefixing a symbol with ..
will look up in calling scope (i.e. the Global Environment) and its value taken to be column names or numbers (source).
data.table assignment by reference using lapply and also returning the rest of the columns
Try
x[, c("a", "b") := lapply(.SD, overwriteNA), .SDcols = c("a", "b")]
Edit:
Per OPs additional request.
myCols <- c("a", "b")
x[, (myCols) := lapply(.SD, overwriteNA), .SDcols = myCols]
Select multiple columns in data.table by their numeric indices
For versions of data.table >= 1.9.8
, the following all just work:
library(data.table)
dt <- data.table(a = 1, b = 2, c = 3)
# select single column by index
dt[, 2]
# b
# 1: 2
# select multiple columns by index
dt[, 2:3]
# b c
# 1: 2 3
# select single column by name
dt[, "a"]
# a
# 1: 1
# select multiple columns by name
dt[, c("a", "b")]
# a b
# 1: 1 2
For versions of data.table < 1.9.8
(for which numerical column selection required the use of with = FALSE
), see this previous version of this answer. See also NEWS on v1.9.8, POTENTIALLY BREAKING CHANGES, point 3.
Extract columns from data table by numeric indices stored in a vector
We can use double dots (..
) before the object 'a' to extract the columns
dt[, ..a]
# col4 col5 col6
#1: 4 5 6
#2: 5 6 7
#3: 6 7 8
#4: 7 8 9
Or another option is with = FALSE
dt[, a, with = FALSE]
data
dt <- data.table(col1 = 1:4, col2 = 2:5, col3 = 3:6, col4 = 4:7, col5 = 5:8, col6 = 6:9)
Selecting columns of a data.table using a vector of column names or column positions without using with = F
An option is to use double dots
DT[, ..mycols]
# A C
#1: 0.1188208 -0.17328827
#2: -0.5622505 0.84231231
#3: 0.8111072 -1.59802306
#4: 0.7968823 2.08468489
# ...
Or specify it in .SDcols
DT[, .SD, .SDcols = mycols]
or else with = FALSE
as the OP mentioned in the post
Related Topics
Deploying R Shiny App as a Standalone Application
Combine Points with Lines with Ggplot2
Moving Columns Within a Data.Frame() Without Retyping
How to Coerce a List Object to Type 'Double'
R Ggplot2: Stat_Count() Must Not Be Used with a Y Aesthetic Error in Bar Graph
How to Combine 2 Plots (Ggplot) into One Plot
Writing Robust R Code: Namespaces, Masking and Using the '::' Operator
Error in Installation a R Package
Plotting Pca Biplot with Ggplot2
Lme4::Lmer Reports "Fixed-Effect Model Matrix Is Rank Deficient", Do I Need a Fix and How To
Convert Data Frame with Date Column to Timeseries
In Ggplot2, What Do the End of the Boxplot Lines Represent
Get Column Index from Label in a Data Frame
Stop an R Program Without Error
Importing CSV File into R - Numeric Values Read as Characters