Why does .. work to pass column names in a character vector variable?
This was a new, experimental feature added in data.table v1.10.2. It is explained in the NEW FEATURES section of the data.table news for changes in v1.10.2.
It reads (quoted directly):
When
j
is a symbol prefixed with..
it will be looked up in calling scope and its value taken to be column names or numbers.myCols = c("colA","colB")
DT[, myCols, with=FALSE]
DT[, ..myCols] # sameWhen you see the
..
prefix think one-level-up like the directory..
in all operating systems meaning the parent directory. In future the..
prefix could be made to work on all symbols apearing anywhere insideDT[...]
. It is intended to be a convenient way to protect your code from accidentally picking up a column name. Similar to howx.
andi.
prefixes (analogous to SQL table aliases) can already be used to disambiguate the same column name present in bothx
andi
. A symbol prefix rather than a..()
function will be easier for us to optimize internally and more convenient if you have many variables in calling scope that you wish to use in your expressions safely. This feature was first raised in 2012 and long wished for, #633. It is experimental.
Note: This answer by Arun led me to this information.
Pass character string of column names (e.g. c(speed, dist) to `across` function in R
You can't use substitute()
or eval()
on character vectors. You need to parse those character vectors into language objects. Otherwise when you eval a string, you just get that string back. It's not like eval
in other languages. One way to do the parsing is str2lang
. Then you can inject that expression into the across
using tidy evaulation's !!
. For example
mtcars_2 %>%
mutate(across(.cols = !!str2lang(.$cols_to_modify),.fns = round))
How to use a character vector of column names in the formula argument of dcast (reshape2)
You can use as.formula
to construct a formula.
Here's an example:
library(reshape2)
## Example from `melt.data.frame`
names(airquality) <- tolower(names(airquality))
df_id <- c("month", "day")
aq <- melt(airquality, id = df_id)
## Constructing the formula
f <- as.formula(paste(paste(df_id, collapse = " + "), "~ variable"))
## Applying it....
dcast(aq, f, value.var = "value", fun.aggregate = mean)
data.table: why is it not always possible to pass column names directly?
(i), (iii) and (iv) sound like Feature Requests (FRs); see here (so, yes, it's partly due to data.table
not having reached full maturity).
As to (v) you said "dt[, c(x1, x2)]
is unlikely to be the desired command here", but in fact I have seen situations where that sort of use of c
within j
is what I'm after. Situations like (v) are what the with
argument of [.data.table
are for.
On (vi) and elsewhere, you suggest "The manual only says 'a vector of column names', but experimentation suggests they must be quoted"; but I think this is unambiguous. A vector of column names means a character
vector, which c(x1,x2)
is not, unless x1
and x2
are somewhere defined as character
vectors themselves. You can also add a FR for documentation on GitHub.
I'm not sure what you're after in (vii), but in i
, vectors of names are used for joins or keyed subsets (also a form of join); see the vignette on fast subsetting.
Is it possible to name a column of a tibble using a variable containing a character vector (string)?
You can use the following solution:
- In order to have column names that are stored as string we make use of bang bang operator
!!
which forces the evaluation of it succeeding name - We also need to use walrus
:=
instead of=
which are equivalent and prompts you to supply name (as is the case with our variable name) on it LHS (left hand side)
CLADE_FIELD = "Clade"
LINEAGE_FIELD = "Lineage"
metaDF = tibble(!!CLADE_FIELD := c("G"),
!!LINEAGE_FIELD := c("B.666"),
"Submission date" = c("2020-03"))
# A tibble: 1 x 3
Clade Lineage `Submission date`
<chr> <chr> <chr>
1 G B.666 2020-03
Or we can use double braces {{}}
as follows:
metaDF = tibble({{CLADE_FIELD}} := c("G"),
{{LINEAGE_FIELD}} := c("B.666"),
"Submission date" = c("2020-03"))
# A tibble: 1 x 3
Clade Lineage `Submission date`
<chr> <chr> <chr>
1 G B.666 2020-03
Or we can make use of glue
syntax and put the variable name within a pair of braces {}
and pass the result as a string. Since glue syntax became available on the LHS of :=
whatever object (here your variable names) you put within a curly braces will be evaluated as R code:
metaDF = tibble("{CLADE_FIELD}" := c("G"),
"{LINEAGE_FIELD}" := c("B.666"),
"Submission date" = c("2020-03"))
# A tibble: 1 x 3
Clade Lineage `Submission date`
<chr> <chr> <chr>
1 G B.666 2020-03
Pass column name in data.table using variable
Use the quote()
and eval()
functions to pass a variable to j
. You don't need double-quotes on the column names when you do it this way, because the quote()
-ed string will be evaluated inside the DT[]
temp <- quote(x)
DT[ , eval(temp)]
# [1] "b" "b" "b" "a" "a"
With a single column name, the result is a vector. If you want a data.table result, or several columns, use list form
temp <- quote(list(x, v))
DT[ , eval(temp)]
# x v
# 1: b 1.52566586
# 2: b 0.66057253
# 3: b -1.29654641
# 4: a -1.71998260
# 5: a 0.03159933
data.table: How do I pass a character vector to a function get data.table to treat its contents as column names?
Picking up MichaelChirico's comment, the function definition can be written as:
log_those_columns <- function(DT, cols_in, cols_new) {
DT[, (cols_new) := lapply(.SD, log), .SDcols = cols_in]
}
which returns:
log_those_columns(DT, old_names, new_names)
DT
Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1: 41 190 7.4 67 5 1 3.713572 2.001480
2: 36 118 8.0 72 5 2 3.583519 2.079442
3: 12 149 12.6 74 5 3 2.484907 2.533697
4: 18 313 11.5 62 5 4 2.890372 2.442347
5: NA NA 14.3 56 5 5 NA 2.660260
---
149: 30 193 6.9 70 9 26 3.401197 1.931521
150: NA 145 13.2 77 9 27 NA 2.580217
151: 14 191 14.3 75 9 28 2.639057 2.660260
152: 18 131 8.0 76 9 29 2.890372 2.079442
153: 20 223 11.5 68 9 30 2.995732 2.442347
as expected.
A more flexible approach
The function used to transform the data can be passed as a parameter as well:
fct_those_columns <- function(DT, cols_in, cols_new, fct) {
DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
}
The call:
fct_those_columns(DT, old_names, new_names, log)
head(DT)
works as expected:
Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1: 41 190 7.4 67 5 1 3.713572 2.001480
2: 36 118 8.0 72 5 2 3.583519 2.079442
3: 12 149 12.6 74 5 3 2.484907 2.533697
4: 18 313 11.5 62 5 4 2.890372 2.442347
5: NA NA 14.3 56 5 5 NA 2.660260
6: 28 NA 14.9 66 5 6 3.332205 2.701361
The function name can be passed as character:
fct_those_columns(DT, old_names, new_names, "sqrt")
head(DT)
Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1: 41 190 7.4 67 5 1 6.403124 2.720294
2: 36 118 8.0 72 5 2 6.000000 2.828427
3: 12 149 12.6 74 5 3 3.464102 3.549648
4: 18 313 11.5 62 5 4 4.242641 3.391165
5: NA NA 14.3 56 5 5 NA 3.781534
6: 28 NA 14.9 66 5 6 5.291503 3.860052
or as an anonymous function:
fct_those_columns(DT, old_names, new_names, function(x) x^(1/2))
head(DT)
Ozone Solar.R Wind Temp Month Day New_Ozone New_Wind
1: 41 190 7.4 67 5 1 6.403124 2.720294
2: 36 118 8.0 72 5 2 6.000000 2.828427
3: 12 149 12.6 74 5 3 3.464102 3.549648
4: 18 313 11.5 62 5 4 4.242641 3.391165
5: NA NA 14.3 56 5 5 NA 3.781534
6: 28 NA 14.9 66 5 6 5.291503 3.860052
An even more flexible approach
The function below derives the names of the new columns by prepending the names of the input columns with the name of the function automatically:
fct_those_columns <- function(DT, cols_in, fct) {
fct_name <- substitute(fct)
cols_new <- paste(if (class(fct_name) == "name") fct_name else fct_name[3], cols_in, sep = "_")
DT[, (cols_new) := lapply(.SD, fct), .SDcols = cols_in]
}
DT <- data.table(airquality)
fct_those_columns(DT, old_names, sqrt)
fct_those_columns(DT, old_names, data.table::as.IDate)
fct_those_columns(DT, old_names, function(x) x^(1/2))
DT
Ozone Solar.R Wind Temp Month Day sqrt_Ozone sqrt_Wind as.IDate_Ozone as.IDate_Wind x^(1/2)_Ozone x^(1/2)_Wind
1: 41 190 7.4 67 5 1 6.403124 2.720294 1970-02-11 1970-01-08 6.403124 2.720294
2: 36 118 8.0 72 5 2 6.000000 2.828427 1970-02-06 1970-01-09 6.000000 2.828427
3: 12 149 12.6 74 5 3 3.464102 3.549648 1970-01-13 1970-01-13 3.464102 3.549648
4: 18 313 11.5 62 5 4 4.242641 3.391165 1970-01-19 1970-01-12 4.242641 3.391165
5: NA NA 14.3 56 5 5 NA 3.781534 <NA> 1970-01-15 NA 3.781534
---
149: 30 193 6.9 70 9 26 5.477226 2.626785 1970-01-31 1970-01-07 5.477226 2.626785
150: NA 145 13.2 77 9 27 NA 3.633180 <NA> 1970-01-14 NA 3.633180
151: 14 191 14.3 75 9 28 3.741657 3.781534 1970-01-15 1970-01-15 3.741657 3.781534
152: 18 131 8.0 76 9 29 4.242641 2.828427 1970-01-19 1970-01-09 4.242641 2.828427
153: 20 223 11.5 68 9 30 4.472136 3.391165 1970-01-21 1970-01-12 4.472136 3.391165
Note that x^(1/2)_Ozone
is not a syntactically valid name in R and needs to be put in backquotes:
DT$`x^(1/2)_Ozone`
Related Topics
Calculate the Derivative of a Data-Function in R
How to Get a List of All Possible Partitions of a Vector in R
Build Word Co-Occurence Edge List in R
How to Change Factor Labels into String in a Data Frame
Different Y-Axis Labels Facet_Grid and Sizes
Fitting Logarithmic Curve in R
How to Flatten R Data Frame That Contains Lists
Ggplot Piecharts on a Ggmap: Labels Destroy the Small Plots
No Dimensions of Non-Empty Numeric Vector in R
How to Write a Data-Frame with One Column a List to a File
Combine Rows and Sum Their Values
Replace Every Single Character at the Start of String That Matches a Regex Pattern
Count Consecutive True Values Within Each Block Separately
Why Should Someone Use {} for Initializing an Empty Object in R
Calculating Prediction Accuracy of a Tree Using Rpart's Predict Method
Inline Function Code Doesn't Compile
Adding a Ranking Column to a Dataframe
Inserting a Table Under the Legend in a Ggplot2 and Saving Everything to a File