Accessing Columns in Data.Table Using a Character Vector of Column Names

Accessing columns in data.table using a character vector of column names

You can use the data.table syntax .. which "looks up one level" (as in the Unix terminal) for the variable:

> all.equal(DT[,list(x,y)], DT[, ..cols])
[1] TRUE
> all.equal(DT[,.SD[,list(x,y)][min(v)]], DT[,.SD[ ,min(v)], .SDcols = cols])
[1] TRUE

More details under FAQ 1.6 I believe: http://datatable.r-forge.r-project.org/datatable-faq.pdf

Selecting columns of a data.table using a vector of column names or column positions without using with = F

An option is to use double dots

DT[, ..mycols]
# A C
#1: 0.1188208 -0.17328827
#2: -0.5622505 0.84231231
#3: 0.8111072 -1.59802306
#4: 0.7968823 2.08468489
# ...

Or specify it in .SDcols

DT[, .SD, .SDcols = mycols]

or else with = FALSE as the OP mentioned in the post

How to use R object of character vector as column names when splitting columns of a data.table?

You can use (namesForm) := instead of namesForm :=.

Example:

test2 <- copy(test)
namesForm <- c("GT", "DP", "RO", "QR", "AO", "QA", "GL")

str(test[, c("GT", "DP", "RO", "QR", "AO", "QA", "GL") := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame': 10 obs. of 9 variables:
# $ POS : num 254 280 303 22 105 173 230 235 257 258
# $ value: chr "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
# $ GT : chr "0/1" "0/0" "0/0" "0/0" ...
# $ DP : chr "15" "15" "13" "367" ...
# $ RO : chr "3" "15" "13" "347" ...
# $ QR : chr "123" "577" "276" "13643" ...
# $ AO : chr "12" "0" "0" "0" ...
# $ QA : chr "478" "0" "0" "0" ...
# $ GL : chr "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
# - attr(*, ".internal.selfref")=<externalptr>

str(test2[, (namesForm) := tstrsplit(value, ":", fixed=TRUE)])
# Classes ‘data.table’ and 'data.frame': 10 obs. of 9 variables:
# $ POS : num 254 280 303 22 105 173 230 235 257 258
# $ value: chr "0/1:15:3:123:12:478:-38.8484,0,-6.94934" "0/0:15:15:577:0:0:0,-4.51545,-52.25" "0/0:13:13:276:0:0:0,-3.91339,-25.0455" "0/0:367:347:13643:0:0:0,-104.457,-1226.73" ...
# $ GT : chr "0/1" "0/0" "0/0" "0/0" ...
# $ DP : chr "15" "15" "13" "367" ...
# $ RO : chr "3" "15" "13" "347" ...
# $ QR : chr "123" "577" "276" "13643" ...
# $ AO : chr "12" "0" "0" "0" ...
# $ QA : chr "478" "0" "0" "0" ...
# $ GL : chr "-38.8484,0,-6.94934" "0,-4.51545,-52.25" "0,-3.91339,-25.0455" "0,-104.457,-1226.73" ...
# - attr(*, ".internal.selfref")=<externalptr>

Concatenating a vector of column names in R data.table

We can use mget to return the values of elements in 'cols' as a list

dt[, slice := do.call(paste, mget(cols))]
head(dt, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb slice
#1: 21 6 160 110 3.9 2.620 16.46 0 1 4 4 110 21 6
#2: 21 6 160 110 3.9 2.875 17.02 0 1 4 4 110 21 6

Or another option is to specify the 'cols' in .SDcols and paste the .SD

dt[, slice:= do.call(paste, .SD), .SDcols = cols]
head(dt, 2)
# mpg cyl disp hp drat wt qsec vs am gear carb slice
#1: 21 6 160 110 3.9 2.620 16.46 0 1 4 4 110 21 6
#2: 21 6 160 110 3.9 2.875 17.02 0 1 4 4 110 21 6

How to select columns in data.table using a character vector of certain column names?

We can use .. notation to find myVector as a vector of column positions, like it would work in data.frame

mtcarsDT[, ..myVector]

According to ?data.table

In case of overlapping variables names inside dataset and in parent scope you can use double dot prefix ..cols to explicitly refer to 'cols variable parent scope and not from your dataset.

Nice way to group data in a `data.table` when the new column name is given as a character vector

Although probably not what you are looking for, but you could use setNames inside, where it wraps around (.(z = mean(y)).

library(data.table)

dt[, setNames(.(z = mean(y)), agg_col_name), by = x]

Or use setnames after doing the summary:

setnames(dt[, mean(y), by = x], 'V1', agg_col_name)[]

Output

   x        avg
1: 1 0.5626526
2: 2 0.3549653
3: 3 -0.2861405

However, as mentioned in the comments, it is easier to do with the dev version of data.table. You can see more about the development of this feature at [programming on data.table #4304]:(https://github.com/Rdatatable/data.table/pull/4304).

# Latest development version:
data.table::update.dev.pkg()

library(data.table)

dt[, .(z = mean(y)), by = x, env = list(z=agg_col_name)]

# x avg
#1: 1 -0.1640783
#2: 2 0.5375794
#3: 3 0.1539785

Select matching columns from a data table using a list of column names

Try dat[, ..col.list] .

The .. signals to data.table to look in the parent frame (i.e. the environment where dat is located) rather than within dat itself.

Access data.table columns with strings

You can use get() as the j argument using single brackets:

library(data.table)
dt <- data.table(iris)
dt[, get("Species")]

The result:

[1] setosa     setosa     setosa     setosa     setosa     setosa .....

You can also use a string directly inside the double bracket operator, like this:

dt[["Species"]]

How to select columns in a dataframe, and skip over columns that don't exist - using data.table syntax

You can use intersect to keep the columns which are available in data.table.

library(data.table)
dt <- data.table(a = 1:5, b = 2:6)

select_cols <- function(dt, cols) {
cols <- intersect(names(dt), cols)
dt[, ..cols]
}

select_cols(dt, c('a', 'b'))

# a b
#1: 1 2
#2: 2 3
#3: 3 4
#4: 4 5
#5: 5 6

select_cols(dt, c('a', 'c'))
# a
#1: 1
#2: 2
#3: 3
#4: 4
#5: 5


Related Topics



Leave a reply



Submit