Convert string to a variable name
assign is what you are looking for.
assign("x", 5)
x
[1] 5
but buyer beware.
See R FAQ 7.21
http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-I-turn-a-string-into-a-variable_003f
Pass a string as variable name in dplyr::filter
!!
or UQ
evaluates the variable, so mtcars %>% filter(!!var == 4)
is the same as mtcars %>% filter('cyl' == 4)
where the condition always evaluates to false; You can prove this by printing !!var
in the filter function:
mtcars %>% filter({ print(!!var); (!!var) == 4 })
# [1] "cyl"
# [1] mpg cyl disp hp drat wt qsec vs am gear carb
# <0 rows> (or 0-length row.names)
To evaluate var
to the cyl
column, you need to convert var
to a symbol of cyl
first, then evaluate the symbol cyl
to a column:
Using rlang
:
library(rlang)
var <- 'cyl'
mtcars %>% filter((!!sym(var)) == 4)
# mpg cyl disp hp drat wt qsec vs am gear carb
#1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
# ...
Or use as.symbol/as.name
from baseR:
mtcars %>% filter((!!as.symbol(var)) == 4)
mtcars %>% filter((!!as.name(var)) == 4)
R: turn string into variable names in `tidytable::complete()`
We need syms
to convert to symbol and then evaluate (!!!
)
input %>%
tidytable::complete.(!!! rlang::syms(columns))
-output
# A tidytable: 657 × 3
a b c
<chr> <chr> <dbl>
1 A a NA
2 A b NA
3 A c NA
4 A d NA
5 A e NA
6 A f NA
7 A g NA
8 A h NA
9 A i NA
10 A j NA
# … with 647 more rows
How to assign vector of strings as variable names, in for loop, in data.table, in dplyr
This is a common situation which can be handled with ease by using a list.
This is what I would do if the data files are different in structure, i.e., columns differ in names, data types, or order:
library(data.table)
file_names <- list.files(pattern = "*.csv")
list_of_df <- lapply(file_names, fread)
list_of_df <- setNames(list_of_df, file_names)
list_of_df
$area.csv
id name
1: 1 normal name
2: 2 with,comma
3: 3 with%percent
$farmland.csv
id name
1: 1 normal name
2: 2 with,comma
3: 3 with%percent
$GDPpercapita.csv
id name
1: 1 normal name
2: 2 with,comma
3: 3 with%percent
Note that I have made up three sample files for demonstration. See Data section for details.
The elements of the resulting list object list_of_df
are named like the files the data were loaded from.
Now, we can operate on the elements of the list using lapply()
or a for
loop, e.g.,
lapply(
list_of_df,
function(df) df[, lapply(.SD, function(col) if (is.character(col)) stringr::str_remove_all(col, "[,%]") else col)]
)
$area.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent
$farmland.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent
$GDPpercapita.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent
Note that the code to remove ,
and %
has been simplified.
lapply()
has the advantage over a for
loop that is returns a list again which is convenient for subsequent processing steps.
As a side note: there is a speciality with data.table
as it is able to update by reference, i.e., without copying the data.table. So, we can update list_of_df
in place which might be a benefit in terms of speed and memory consumption for large datasets:
address(list_of_df) # just for demonstration
for (df in list_of_df) {
cols <- which(sapply(df, is.character))
df[, (cols) := lapply(.SD, stringr::str_remove_all, "[,%]"), .SDcols = cols]
}
address(list_of_df)
The calls to address(list_of_df)
before and after the for
loop have been added to demonstrate that list_of_df
still occupies the same storage location but has been changed in place.
list_of_df
$area.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent
$farmland.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent
$GDPpercapita.csv
id name
1: 1 normal name
2: 2 withcomma
3: 3 withpercent
In case the datasets read from file have a similar structure, i.e. same name, order and data type of columns, we can combine the single pieces into one large dataset using rbindlist()
My preferred workflow for this use case is along
library(data.table)
library(magrittr)
file_names <- list.files(pattern = "*.csv")
big_df <- lapply(file_names, fread) %>%
set_names(file_names) %>%
rbindlist(idcol = "file_name")
big_df
file_name id name
1: area.csv 1 normal name
2: area.csv 2 with,comma
3: area.csv 3 with%percent
4: farmland.csv 1 normal name
5: farmland.csv 2 with,comma
6: farmland.csv 3 with%percent
7: GDPpercapita.csv 1 normal name
8: GDPpercapita.csv 2 with,comma
9: GDPpercapita.csv 3 with%percent
Note that rbindlist()
has created an id column from the names of the list elements. This allows for distinguishing the origin of each row.
Working with one uniform data structure simplifies subsequent processing
cols <- which(sapply(big_df, is.character))
big_df[, (cols) := lapply(.SD, stringr::str_remove_all, "[,%]"), .SDcols = cols]
big_df
file_name id name
1: area.csv 1 normal name
2: area.csv 2 withcomma
3: area.csv 3 withpercent
4: farmland.csv 1 normal name
5: farmland.csv 2 withcomma
6: farmland.csv 3 withpercent
7: GDPpercapita.csv 1 normal name
8: GDPpercapita.csv 2 withcomma
9: GDPpercapita.csv 3 withpercent
As the OP is using mutate()
here is an all "tidyverse" approach. It does essentially the same as the data.table versions above:
library(purrr)
library(dplyr)
file_names <- list.files(pattern = "*.csv")
list_of_df <- map(file_names, readr::read_csv) %>%
set_names(file_names)
list_of_df %>%
map( ~ mutate(.x, across(where(is.character), ~ stringr::str_remove_all(.x, "[,%]"))))
$area.csv
# A tibble: 3 x 2
id name
<dbl> <chr>
1 1 normal name
2 2 withcomma
3 3 withpercent
$farmland.csv
# A tibble: 3 x 2
id name
<dbl> <chr>
1 1 normal name
2 2 withcomma
3 3 withpercent
$GDPpercapita.csv
# A tibble: 3 x 2
id name
<dbl> <chr>
1 1 normal name
2 2 withcomma
3 3 withpercent
map()
is the equivalent of base R's lapply()
. Also readr::read_csv()
is used instead of data.table
's fread()
.
Data
Caveat: The code below will create 3 files in the current working directory!
library(data.table)
dummy <- data.table(id = 1:3, name = c("normal name", "with,comma", "with%percent"))
extern <- c("area.csv", "farmland.csv", "GDPpercapita.csv")
for (fn in extern) fwrite(dummy, fn)
The code saves a dummy data.table three times as csv file to disk using three different file names.
Strings as variable references in an R formula
One solution is to build the formula up using paste()
and convert it to a formula:
> ## your example plus some dummy data
> var1 <- "V001"
> var2 <- "V002"
> var3 <- "V003"
> dat <- data.frame(V001 = runif(10), V002 = runif(10), V003 = runif(10))
> f <- formula(paste(var1, "~", var2, "+", var3))
Now we can look at f
> f
V001 ~ V002 + V003
> class(f)
[1] "formula"
and it really is a formula. We can now pass this into rlm()
as the first argument:
> require(MASS)
> mod <- rlm(f, data = dat)
> mod
Call:
rlm(formula = f, data = dat)
Converged in 8 iterations
Coefficients:
(Intercept) V002 V003
0.2725538 -0.1281576 0.1617250
Degrees of freedom: 10 total; 7 residual
Scale estimate: 0.251
HTH
Related Topics
Position of the Sun Given Time of Day, Latitude and Longitude
Creating a Summary Statistical Table from a Data Frame
How to Replace Nas When Joining Two Data Frames with Dplyr
Test for Equality Among All Elements of a Single Numeric Vector
Use Different Center Than the Prime Meridian in Plotting a World Map
When Importing CSV into R How to Generate Column with Name of the CSV
How to Remove an Element from a List
How to Increase the Space Between the Bars in a Bar Plot in Ggplot2
Creating Dummy Variables in R Data.Table
Check If Point Is in Spatial Object Which Consists of Multiple Polygons/Holes
Generate Random Numbers with Fixed Mean and Sd
Reset Inputs' Button in Shiny App
R Split Numeric Vector at Position
Set Locale to System Default Utf-8
Convert Yyyymmdd String to Date Class in R
In R Markdown in Rstudio, How to Prevent the Source Code from Running Off a PDF Page
How to Paste a String on Each Element of a Vector of Strings Using Apply in R