Accessing columns in data.table using a character vector of column names
You can use the data.table
syntax ..
which "looks up one level" (as in the Unix terminal) for the variable:
> all.equal(DT[,list(x,y)], DT[, ..cols])
[1] TRUE
> all.equal(DT[,.SD[,list(x,y)][min(v)]], DT[,.SD[ ,min(v)], .SDcols = cols])
[1] TRUE
More details under FAQ 1.6 I believe: http://datatable.r-forge.r-project.org/datatable-faq.pdf
Selecting columns of a data.table using a vector of column names or column positions without using with = F
An option is to use double dots
DT[, ..mycols]
# A C
#1: 0.1188208 -0.17328827
#2: -0.5622505 0.84231231
#3: 0.8111072 -1.59802306
#4: 0.7968823 2.08468489
# ...
Or specify it in .SDcols
DT[, .SD, .SDcols = mycols]
or else with = FALSE
as the OP mentioned in the post
How to use R object of character vector as column names when splitting columns of a data.table?
You can use (namesForm) :=
instead of namesForm :=
.
Example:
test2 <- copy(test)
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")
str(test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame': 10 obs. of 9 variables:
# $ POS : num 254 280 303 22 105 173 230 235 257 258
# $ value: chr "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
# $ GT : chr "0/1" "0/0" "0/0" "0/0" ...
# $ DP : chr "15" "15" "13" "367" ...
# $ RO : chr "3" "15" "13" "347" ...
# $ QR : chr "123" "577" "276" "13643" ...
# $ AO : chr "12" "0" "0" "0" ...
# $ QA : chr "478" "0" "0" "0" ...
# $ GL : chr "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
# - attr(*, ".internal.selfref")=<externalptr>
str(test2[, (namesForm) := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame': 10 obs. of 9 variables:
# $ POS : num 254 280 303 22 105 173 230 235 257 258
# $ value: chr "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
# $ GT : chr "0/1" "0/0" "0/0" "0/0" ...
# $ DP : chr "15" "15" "13" "367" ...
# $ RO : chr "3" "15" "13" "347" ...
# $ QR : chr "123" "577" "276" "13643" ...
# $ AO : chr "12" "0" "0" "0" ...
# $ QA : chr "478" "0" "0" "0" ...
# $ GL : chr "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
# - attr(*, ".internal.selfref")=<externalptr>
Concatenating a vector of column names in R data.table
We can use mget
to return the values of elements in 'cols' as a list
dt[, slice := do.call(paste, mget(cols))]
head(dt, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb slice
#1: 21 6 160 110 3.9 2.620 16.46 0 1 4 4 110 21 6
#2: 21 6 160 110 3.9 2.875 17.02 0 1 4 4 110 21 6
Or another option is to specify the 'cols' in .SDcols
and paste
the .SD
dt[, slice:= do.call(paste, .SD), .SDcols = cols]
head(dt, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb slice
#1: 21 6 160 110 3.9 2.620 16.46 0 1 4 4 110 21 6
#2: 21 6 160 110 3.9 2.875 17.02 0 1 4 4 110 21 6
How to select columns in data.table using a character vector of certain column names?
We can use ..
notation to find myVector
as a vector of column positions, like it would work in data.frame
mtcarsDT[, ..myVector]
According to ?data.table
In case of overlapping variables names inside dataset and in parent scope you can use double dot prefix
..cols
to explicitly refer to 'cols variable parent scope and not from your dataset.
Nice way to group data in a `data.table` when the new column name is given as a character vector
Although probably not what you are looking for, but you could use setNames
inside, where it wraps around (.(z = mean(y))
.
library(data.table)
dt[, setNames(.(z = mean(y)), agg_col_name), by = x]
Or use setnames
after doing the summary:
setnames(dt[, mean(y), by = x], 'V1', agg_col_name)[]
Output
x avg
1: 1 0.5626526
2: 2 0.3549653
3: 3 -0.2861405
However, as mentioned in the comments, it is easier to do with the dev version of data.table
. You can see more about the development of this feature at [programming on data.table #4304]:(https://github.com/Rdatatable/data.table/pull/4304).
# Latest development version:
data.table::update.dev.pkg()
library(data.table)
dt[, .(z = mean(y)), by = x, env = list(z=agg_col_name)]
# x avg
#1: 1 -0.1640783
#2: 2 0.5375794
#3: 3 0.1539785
Select matching columns from a data table using a list of column names
Try dat[, ..col.list]
.
The ..
signals to data.table
to look in the parent frame (i.e. the environment where dat
is located) rather than within dat
itself.
Access data.table columns with strings
You can use get()
as the j
argument using single brackets:
library(data.table)
dt <- data.table(iris)
dt[, get("Species")]
The result:
[1] setosa setosa setosa setosa setosa setosa .....
You can also use a string directly inside the double bracket operator, like this:
dt[["Species"]]
How to select columns in a dataframe, and skip over columns that don't exist - using data.table syntax
You can use intersect
to keep the columns which are available in data.table.
library(data.table)
dt <- data.table(a = 1:5, b = 2:6)
select_cols <- function(dt, cols) {
cols <- intersect(names(dt), cols)
dt[, ..cols]
}
select_cols(dt, c('a', 'b'))
# a b
#1: 1 2
#2: 2 3
#3: 3 4
#4: 4 5
#5: 5 6
select_cols(dt, c('a', 'c'))
# a
#1: 1
#2: 2
#3: 3
#4: 4
#5: 5
Related Topics
Error: Vector Memory Exhausted (Limit Reached) R 3.5.0 MACos
Why (Or When) Is Rscript (Or Littler) Better Than R Cmd Batch
Add Image in Title Page of Rmarkdown PDF
Extracting Unique Rows from a Data Table in R
R Sum a Variable by Two Groups
How to Find the Indices of the Top 10,000 Elements in a Symmetric Matrix(12K X 12K) in R
Adding Empty Graphs to Facet_Wrap in Ggplot2
Bars in Geom_Bar Have Unwanted Different Widths When Using Facet_Wrap
How to Suppress the Creation of a Plot While Calling a Function in R
Formatting a Date in R Without Leading Zeros
Correctly Specifying "Logical Conditions" (In R)
How to Load Data Quickly into R
Reset the Graphical Parameters Back to Default Values Without Use of Dev.Off()
Setting Work Directory in Knitr Using Opts_Chunk$Set(Root.Dir = ...) Doesn't Work