How to Reference Column Names That Start with a Number, in Data.Table

How to reference column names that start with a number, in data.table

I think, this is what you're looking for, not sure. data.table is different from data.frame. Please have a look at the quick introduction, and then the FAQ (and also the reference manual if necessary).

require(data.table)
dt <- data.table("4PCS" = 1:3, y=3:1)
#   4PCS y
# 1:    1 3
# 2:    2 2
# 3:    3 1

# access column 4PCS
dt[, "4PCS"]

# returns a data.table
# 4PCS
# 1: 1
# 2: 2
# 3: 3

# to access multiple columns by name
dt[, c("4PCS", "y")]

Alternatively, if you need to access the column and not result in a data.table, rather a vector, then you can access using the $ notation:

dt$`4PCS` # notice the ` because the variable begins with a number
# [1] 1 2 3

# alternatively, as mnel mentioned under comments:
dt[, `4PCS`]
# [1] 1 2 3

Or if you know the column number you can access using [[.]] as follows:

dt[[1]] # 4PCS is the first column here
# [1] 1 2 3

Edit:

Thanks @joran. I think you're looking for this:

dt[, `4PCS` + y]
# [1] 4 4 4

Fundamentally the issue is that 4CPS is not a valid variable name in R (try 4CPS <- 1, you'll get the same "Unexpected symbol" error). So to refer to it, we have to use backticks (compare`4CPS` <- 1)

Datatable column name omits names starting with a number

From the SQL standards, table names cannot start with a number.
See this post for further information: Table or column name cannot start with numeric?

Reference `data.table` column by name

It sounds like you're looking for get:

DT = data.table(a=1, b=2, "a+b"=8)
col = "a+b"
DT[, get(col) * 2]
# [1] 16
DT[, c := get(col) * 2]
DT
# a b a+b c
# 1: 1 2 8 16

Referring to data.table columns by names saved in variables

If you are going to be doing complicated operations inside your j expressions, you should probably use eval and quote. One problem with that in current version of data.table is that the environment of eval is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD to eval. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1] in j would).

Interestingly this issue only plagues the j and you'll be fine using eval normally in i (where .SD is not available anyway).

The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:

x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)

x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]

Note how I was ok not adding .SD in one eval(distcol), but won't be if I take it out of the other eval.

Another option is to use get:

diststr = "dist"
valstr = "val"

x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]

Using a variable to specify a column name within `data.table`

Data:

library(data.table)
dt = data.table(col1=letters[1:2], x=c('1','2'))

One solution is to use quote and the eval in your data.table:

y = quote(x)
dt[,eval(y):=as.numeric(eval(y))]

#> is.numeric(dt$x)
#[1] TRUE

Pass column name in data.table using variable

Use the quote() and eval() functions to pass a variable to j. You don't need double-quotes on the column names when you do it this way, because the quote()-ed string will be evaluated inside the DT[]

temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"

With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form

temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933

creating, directly, data.tables with column names from variables, and using variables for column names with :=

For the first question, I'm not absolutely sure, but you may want to try and see if fread is of any help creating an empty data.table with named columns.

As for the second question, try

DT[, c(nameOfCols) := 10]

Where nameOfCols is the vector with names of the columns you want to modify. See ?data.table

Calculating columns using number strings in column names, data.table

I think pivoting (longer), summarizing, then unpivoting (wider) works. (I wonder if keeping it in long form might be better in the long run, over to you.)

library(data.table)
money_table <- setDT(structure(list(ID = 1:3, LunchMoney_1213 = c(12L, 234L, 14L), DinnerMondey_1213 = c(24L, 12L, 19L), LunchMoney_1314 = c(17L, 43L, 2L), DinnerMondy_1314 = c(18L, 44L, 12L)), row.names = c(NA, -3L), class = "data.frame"))

dcast(
melt(money_table, id.vars = "ID"
)[, yr := paste0("TotalMoney_", gsub(".*_", "", variable))
][, .(value = sum(value)), by = .(ID, yr)
],
ID ~ yr, value.vars = "value")
# ID TotalMoney_1213 TotalMoney_1314
# <int> <int> <int>
# 1: 1 36 35
# 2: 2 246 87
# 3: 3 33 14

If you're already using magrittr for other things (whether with dplyr or not ... I use it with data.table all the time), this can be slightly more readable as:

library(magrittr)
melt(money_table, id.vars = "ID") %>%
.[, yr := paste0("TotalMoney_", gsub(".*_", "", variable))] %>%
.[, .(value = sum(value)), by = .(ID, yr)] %>%
dcast(., ID ~ yr, value.vars = "value")

Use variable instead of data table column name reference in sum

We can use the [[ to do this

searchMonth <- "Mar 2018"
sum(Table1[[searchMonth]][Table1$Product=="Apples"])

Backquotes or quotes are only needed when there are spaces or special characters in the column names. Otherwise, it can be simply extracted without quoting



Related Topics



Leave a reply



Submit