Extract a Column from a Data.Table as a Vector, by Position

Extract a column from a data.table as a vector, by position

A data.table inherits from class data.frame. Therefore it is a list (of column vectors) internally and can be treated as such.

is.list(DT)
#[1] TRUE

Fortunately, list subsetting, i.e. [[, is very fast and, in contrast to [, package data.table doesn't define a method for it. Thus, you can simply use [[ to extract by an index:

DT[[2]]
#[1] 3 4

Extract a column by reference from a data.table as a vector

We can use the [[ to extract the column as vector

is.vector(DT[[col]])
#[1] TRUE

Extract columns from data table by numeric indices stored in a vector

We can use double dots (..) before the object 'a' to extract the columns

dt[, ..a]
# col4 col5 col6
#1: 4 5 6
#2: 5 6 7
#3: 6 7 8
#4: 7 8 9

Or another option is with = FALSE

dt[, a, with = FALSE]

data

dt <- data.table(col1 = 1:4, col2 = 2:5, col3 = 3:6, col4 = 4:7, col5 = 5:8, col6 = 6:9)

Convert data.frame column to a vector?

I'm going to attempt to explain this without making any mistakes, but I'm betting this will attract a clarification or two in the comments.

A data frame is a list. When you subset a data frame using the name of a column and [, what you're getting is a sublist (or a sub data frame). If you want the actual atomic column, you could use [[, or somewhat confusingly (to me) you could do aframe[,2] which returns a vector, not a sublist.

So try running this sequence and maybe things will be clearer:

avector <- as.vector(aframe['a2'])
class(avector)

avector <- aframe[['a2']]
class(avector)

avector <- aframe[,2]
class(avector)

Select column of data.table and return vector

With data.frame, the default is drop = TRUE and in data.table, it is the opposite while it is done internally. According to ?data.table

drop - Never used by data.table. Do not use. It needs to be here because data.table inherits from data.frame.

In order to get the same behavior, we can use [[ to extract the column by passing a string

identical(dat[["Species"]], iris[, "Species"])
#[1] TRUE

Or

dat$Species

By using [[ or $, it extracts as a vector while also bypass the data.table overhead

How to select columns in data.table using a character vector of certain column names?

We can use .. notation to find myVector as a vector of column positions, like it would work in data.frame

mtcarsDT[, ..myVector]

According to ?data.table

In case of overlapping variables names inside dataset and in parent scope you can use double dot prefix ..cols to explicitly refer to 'cols variable parent scope and not from your dataset.

Selecting columns of a data.table using a vector of column names or column positions without using with = F

An option is to use double dots

DT[, ..mycols]
# A C
#1: 0.1188208 -0.17328827
#2: -0.5622505 0.84231231
#3: 0.8111072 -1.59802306
#4: 0.7968823 2.08468489
# ...

Or specify it in .SDcols

DT[, .SD, .SDcols = mycols]

or else with = FALSE as the OP mentioned in the post

Split a data.table at position

You can use findInterval/cut to create groups based on pos :

library(data.table)
x[, mean(a), findInterval(a, pos)]

# findInterval V1
#1: 0 1.5
#2: 1 3.5
#3: 2 6.5


Related Topics



Leave a reply



Submit