How to reference column names that start with a number, in data.table
I think, this is what you're looking for, not sure. data.table
is different from data.frame
. Please have a look at the quick introduction, and then the FAQ (and also the reference manual if necessary).
require(data.table)
dt <- data.table("4PCS" = 1:3, y=3:1)
# 4PCS y
# 1: 1 3
# 2: 2 2
# 3: 3 1
# access column 4PCS
dt[, "4PCS"]
# returns a data.table
# 4PCS
# 1: 1
# 2: 2
# 3: 3
# to access multiple columns by name
dt[, c("4PCS", "y")]
Alternatively, if you need to access the column and not result in a data.table
, rather a vector, then you can access using the $
notation:
dt$`4PCS` # notice the ` because the variable begins with a number
# [1] 1 2 3
# alternatively, as mnel mentioned under comments:
dt[, `4PCS`]
# [1] 1 2 3
Or if you know the column number you can access using [[.]]
as follows:
dt[[1]] # 4PCS is the first column here
# [1] 1 2 3
Edit:
Thanks @joran. I think you're looking for this:
dt[, `4PCS` + y]
# [1] 4 4 4
Fundamentally the issue is that 4CPS
is not a valid variable name in R (try 4CPS <- 1
, you'll get the same "Unexpected symbol" error). So to refer to it, we have to use backticks (compare`4CPS` <- 1
)
Datatable column name omits names starting with a number
From the SQL standards, table names cannot start with a number.
See this post for further information: Table or column name cannot start with numeric?
Reference `data.table` column by name
It sounds like you're looking for get
:
DT = data.table(a=1, b=2, "a+b"=8)
col = "a+b"
DT[, get(col) * 2]
# [1] 16
DT[, c := get(col) * 2]
DT
# a b a+b c
# 1: 1 2 8 16
Referring to data.table columns by names saved in variables
If you are going to be doing complicated operations inside your j
expressions, you should probably use eval
and quote
. One problem with that in current version of data.table
is that the environment of eval
is not always correctly processed - eval and quote in data.table (Note: There has been an update to that answer based on an update to the package.) - and the current fix for that is to add .SD
to eval
. As far as I can tell from a few tests that I've run this doesn't affect speed (the way e.g. having .SD[1]
in j
would).
Interestingly this issue only plagues the j
and you'll be fine using eval
normally in i
(where .SD
is not available anyway).
The other problem is assignment, and there you have to have strings. I know one way to extract the string name from a quoted expression - it's not pretty, but it works. Here's an example combining everything together:
x = data.table(dist = c(1:10), val = c(1:10))
distcol = quote(dist)
valcol = quote(val)
x[eval(valcol) < 5,
capture.output(str(distcol, give.head = F)) := eval(distcol)*sum(eval(distcol, .SD))]
Note how I was ok not adding .SD
in one eval(distcol)
, but won't be if I take it out of the other eval
.
Another option is to use get
:
diststr = "dist"
valstr = "val"
x[get(valstr) < 5, c(diststr) := get(diststr)*sum(get(diststr))]
Using a variable to specify a column name within `data.table`
Data:
library(data.table)
dt = data.table(col1=letters[1:2], x=c('1','2'))
One solution is to use quote
and the eval
in your data.table
:
y = quote(x)
dt[,eval(y):=as.numeric(eval(y))]
#> is.numeric(dt$x)
#[1] TRUE
Pass column name in data.table using variable
Use the quote()
and eval()
functions to pass a variable to j
. You don't need double-quotes on the column names when you do it this way, because the quote()
-ed string will be evaluated inside the DT[]
temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"
With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form
temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933
creating, directly, data.tables with column names from variables, and using variables for column names with :=
For the first question, I'm not absolutely sure, but you may want to try and see if fread
is of any help creating an empty data.table with named columns.
As for the second question, try
DT[, c(nameOfCols) := 10]
Where nameOfCols
is the vector with names of the columns you want to modify. See ?data.table
Calculating columns using number strings in column names, data.table
I think pivoting (longer), summarizing, then unpivoting (wider) works. (I wonder if keeping it in long form might be better in the long run, over to you.)
library(data.table)
money_table <- setDT(structure(list(ID = 1:3, LunchMoney_1213 = c(12L, 234L, 14L), DinnerMondey_1213 = c(24L, 12L, 19L), LunchMoney_1314 = c(17L, 43L, 2L), DinnerMondy_1314 = c(18L, 44L, 12L)), row.names = c(NA, -3L), class = "data.frame"))
dcast(
melt(money_table, id.vars = "ID"
)[, yr := paste0("TotalMoney_", gsub(".*_", "", variable))
][, .(value = sum(value)), by = .(ID, yr)
],
ID ~ yr, value.vars = "value")
# ID TotalMoney_1213 TotalMoney_1314
# <int> <int> <int>
# 1: 1 36 35
# 2: 2 246 87
# 3: 3 33 14
If you're already using magrittr
for other things (whether with dplyr
or not ... I use it with data.table
all the time), this can be slightly more readable as:
library(magrittr)
melt(money_table, id.vars = "ID") %>%
.[, yr := paste0("TotalMoney_", gsub(".*_", "", variable))] %>%
.[, .(value = sum(value)), by = .(ID, yr)] %>%
dcast(., ID ~ yr, value.vars = "value")
Use variable instead of data table column name reference in sum
We can use the [[
to do this
searchMonth <- "Mar 2018"
sum(Table1[[searchMonth]][Table1$Product=="Apples"])
Backquotes or quotes are only needed when there are spaces or special characters in the column names. Otherwise, it can be simply extracted without quoting
Related Topics
How Does R Handle Object in Function Call
R - Data Frame - Convert to Sparse Matrix
Unexpected Behaviour with Argument Defaults
Creating a More Continuous Color Palette in R, Ggplot2, Lattice, or Latticeextra
How to Simultaneously Apply Color/Shape/Size in a Scatter Plot Using Plotly
Search for Corresponding Node in a Regression Tree Using Rpart
Error When Exporting Dataframe to Text File in R
Fitting Logarithmic Curve in R
How to Control Label Color Depending on Fill Darkness of Bars
Calculate Elapsed Time Since Last Event
R: Row-Wise Dplyr::Mutate Using Function That Takes a Data Frame Row and Returns an Integer
R Reshape2 'Aggregation Function Missing: Defaulting to Length'
R: How to Aggregate Some Columns While Keeping Other Columns
Format Axis Tick Labels to Percentage in Plotly
How to Have a New Line in a 'Bquote' Expression Used with 'Text'