Getting Strings Recognized as Variable Names in R

Convert string to a variable name

assign is what you are looking for.

assign("x", 5)

x
[1] 5

but buyer beware.

See R FAQ 7.21
http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-turn-a-string-into-a-variable_003f

Pass a string as variable name in dplyr::filter

!! or UQ evaluates the variable, so mtcars %>% filter(!!var == 4) is the same as mtcars %>% filter('cyl' == 4) where the condition always evaluates to false; You can prove this by printing !!var in the filter function:

mtcars %>% filter({ print(!!var); (!!var) == 4 })
# [1] "cyl"
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
# <0 rows> (or 0-length row.names)

To evaluate var to the cyl column, you need to convert var to a symbol of cyl first, then evaluate the symbol cyl to a column:

Using rlang:

library(rlang)
var <- 'cyl'
mtcars %>% filter((!!sym(var)) == 4)

# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# ...

Or use as.symbol/as.name from baseR:

mtcars %>% filter((!!as.symbol(var)) == 4)

mtcars %>% filter((!!as.name(var)) == 4)

R: turn string into variable names in `tidytable::complete()`

We need syms to convert to symbol and then evaluate (!!!)

input %>% 
tidytable::complete.(!!! rlang::syms(columns))

-output

# A tidytable: 657 × 3
a b c
<chr> <chr> <dbl>
1 A a NA
2 A b NA
3 A c NA
4 A d NA
5 A e NA
6 A f NA
7 A g NA
8 A h NA
9 A i NA
10 A j NA
# … with 647 more rows

How to assign vector of strings as variable names, in for loop, in data.table, in dplyr

This is a common situation which can be handled with ease by using a list.

This is what I would do if the data files are different in structure, i.e., columns differ in names, data types, or order:

library(data.table)
file_names <- list.files(pattern = "*.csv")
list_of_df <- lapply(file_names, fread)
list_of_df <- setNames(list_of_df, file_names)
list_of_df
$area.csv
id name
1: 1 normal name
2: 2 with,comma
3: 3 with%percent

$farmland.csv
id name
1: 1 normal name
2: 2 with,comma
3: 3 with%percent

$GDPpercapita.csv
id name
1: 1 normal name
2: 2 with,comma
3: 3 with%percent

Note that I have made up three sample files for demonstration. See Data section for details.

The elements of the resulting list object list_of_df are named like the files the data were loaded from.

Now, we can operate on the elements of the list using lapply() or a for loop, e.g.,

lapply(
list_of_df,
function(df) df[, lapply(.SD, function(col) if (is.character(col)) stringr::str_remove_all(col, "[,%]") else col)]
)

$area.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent

$farmland.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent

$GDPpercapita.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent

Note that the code to remove , and % has been simplified.

lapply() has the advantage over a for loop that is returns a list again which is convenient for subsequent processing steps.


As a side note: there is a speciality with data.table as it is able to update by reference, i.e., without copying the data.table. So, we can update list_of_df in place which might be a benefit in terms of speed and memory consumption for large datasets:

address(list_of_df) # just for demonstration
for (df in list_of_df) {
cols <- which(sapply(df, is.character))
df[, (cols) := lapply(.SD, stringr::str_remove_all, "[,%]"), .SDcols = cols]
}
address(list_of_df)

The calls to address(list_of_df) before and after the for loop have been added to demonstrate that list_of_df still occupies the same storage location but has been changed in place.

list_of_df
$area.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent

$farmland.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent

$GDPpercapita.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent

In case the datasets read from file have a similar structure, i.e. same name, order and data type of columns, we can combine the single pieces into one large dataset using rbindlist()

My preferred workflow for this use case is along

library(data.table)
library(magrittr)
file_names <- list.files(pattern = "*.csv")
big_df <- lapply(file_names, fread) %>%
set_names(file_names) %>%
rbindlist(idcol = "file_name")
big_df
          file_name id         name
1: area.csv 1 normal name
2: area.csv 2 with,comma
3: area.csv 3 with%percent
4: farmland.csv 1 normal name
5: farmland.csv 2 with,comma
6: farmland.csv 3 with%percent
7: GDPpercapita.csv 1 normal name
8: GDPpercapita.csv 2 with,comma
9: GDPpercapita.csv 3 with%percent

Note that rbindlist() has created an id column from the names of the list elements. This allows for distinguishing the origin of each row.

Working with one uniform data structure simplifies subsequent processing

cols <- which(sapply(big_df, is.character))
big_df[, (cols) := lapply(.SD, stringr::str_remove_all, "[,%]"), .SDcols = cols]
big_df
          file_name id        name
1: area.csv 1 normal name
2: area.csv 2 withcomma
3: area.csv 3 withpercent
4: farmland.csv 1 normal name
5: farmland.csv 2 withcomma
6: farmland.csv 3 withpercent
7: GDPpercapita.csv 1 normal name
8: GDPpercapita.csv 2 withcomma
9: GDPpercapita.csv 3 withpercent

As the OP is using mutate() here is an all "tidyverse" approach. It does essentially the same as the data.table versions above:

library(purrr)
library(dplyr)
file_names <- list.files(pattern = "*.csv")
list_of_df <- map(file_names, readr::read_csv) %>%
set_names(file_names)

list_of_df %>%
map( ~ mutate(.x, across(where(is.character), ~ stringr::str_remove_all(.x, "[,%]"))))
$area.csv
# A tibble: 3 x 2
id name
<dbl> <chr>
1 1 normal name
2 2 withcomma
3 3 withpercent

$farmland.csv
# A tibble: 3 x 2
id name
<dbl> <chr>
1 1 normal name
2 2 withcomma
3 3 withpercent

$GDPpercapita.csv
# A tibble: 3 x 2
id name
<dbl> <chr>
1 1 normal name
2 2 withcomma
3 3 withpercent

map() is the equivalent of base R's lapply(). Also readr::read_csv() is used instead of data.table's fread().

Data

Caveat: The code below will create 3 files in the current working directory!

library(data.table)
dummy <- data.table(id = 1:3, name = c("normal name", "with,comma", "with%percent"))
extern <- c("area.csv", "farmland.csv", "GDPpercapita.csv")
for (fn in extern) fwrite(dummy, fn)

The code saves a dummy data.table three times as csv file to disk using three different file names.

Strings as variable references in an R formula

One solution is to build the formula up using paste() and convert it to a formula:

> ## your example plus some dummy data
> var1 <- "V001"
> var2 <- "V002"
> var3 <- "V003"
> dat <- data.frame(V001 = runif(10), V002 = runif(10), V003 = runif(10))
> f <- formula(paste(var1, "~", var2, "+", var3))

Now we can look at f

> f
V001 ~ V002 + V003
> class(f)
[1] "formula"

and it really is a formula. We can now pass this into rlm() as the first argument:

> require(MASS)
> mod <- rlm(f, data = dat)
> mod
Call:
rlm(formula = f, data = dat)
Converged in 8 iterations

Coefficients:
(Intercept) V002 V003
0.2725538 -0.1281576 0.1617250

Degrees of freedom: 10 total; 7 residual
Scale estimate: 0.251

HTH



Related Topics



Leave a reply



Submit