How to use reference variables by character string in a formula?
I see a couple issues going on here. First, and I don't think this is causing any trouble, but let's make your data frame in one step so you don't have v1
through v4
floating around both in the global environment as well as in the data frame. Second, let's just make v2
a factor here so that we won't have to deal with making it a factor later.
dat <- data.frame(v1 = rnorm(10),
v2 = factor(sample(c(0,1), 10, replace=TRUE)),
v3 = rnorm(10),
v4 = rnorm(10) )
Part One Now, for your first part, it looks like this is what you want:
lm(v1 ~ v2 + v3 + v4, data=dat)
Here's a simpler way to do that, though you still have to specify the response variable.
lm(v1 ~ ., data=dat)
Alternatively, you certainly can build up the function with paste and call lm
on it.
f <- paste(names(dat)[1], "~", paste(names(dat)[-1], collapse=" + "))
# "v1 ~ v2 + v3 + v4"
lm(f, data=dat)
However, my preference in these situations is to use do.call
, which evaluates expressions before passing them to the function; this makes the resulting object more suitable for calling functions like update
on. Compare the call
part of the output.
do.call("lm", list(as.formula(f), data=as.name("dat")))
Part Two About your second part, it looks like this is what you're going for:
lm(factor(v2) + v3 + v4 + v2*v3 + v2*v4, data=dat)
First, because v2
is a factor in the data frame, we don't need that part, and secondly, this can be simplified further by better using R's methods for using arithmetical operations to create interactions, like this.
lm(v1 ~ v2*(v3 + v4), data=dat)
I'd then simply create the function using paste
; the loop with assign
, even in the larger case, is probably not a good idea.
f <- paste(names(dat)[1], "~", names(dat)[2], "* (",
paste(names(dat)[-c(1:2)], collapse=" + "), ")")
# "v1 ~ v2 * ( v3 + v4 )"
It can then be called using either lm
directly or with do.call
.
lm(f, data=dat)
do.call("lm", list(as.formula(f), data=as.name("dat")))
About your code The problem you had with trying to use r3
etc was that you wanted the contents of the variable r3
, not the value r3
. To get the value, you need get
, like this, and then you'd collapse the values together with paste
.
vars <- sapply(paste0("r", 3:6), get)
paste(vars, collapse=" + ")
However, a better way would be to avoid assign
and just build a vector of the terms you want, like this.
vars <- NULL
for (v in 3:4) {
vars <- c(vars, colnames(dat)[v], paste(colnames(dat)[2],
colnames(dat)[v], sep="*"))
}
paste(vars, collapse=" + ")
A more R-like solution would be to use lapply
:
vars <- unlist(lapply(colnames(dat)[3:4],
function(x) c(x, paste(colnames(dat)[2], x, sep="*"))))
R how to make assignments and reference a variable name converted from a string
We can use mget
to return the values of the objects in a list
and then cbind
the list
elements to a single dataset with do.call
do.call(cbind, mget(namevector))
# id.96 id.99
#[1,] 2 52
#[2,] 3 53
assuming per
is id
How to convert a string to a formula with the type of `language` in R?
Use parse
:
y <- "1 - sin(x^3)"
p <- parse(text = y)[[1]]
p
## 1 - sin(x^3)
is.language(p)
## [1] TRUE
typeof(p)
## [1] "language"
x <- pi/4
eval(p)
## [1] 0.5342579
Note that is.language(parse(text = y))
is also TRUE but it is of type expression
. On the other hand, eval(parse(text = y))
gives the same result.
Passing a character vector of variables into selection() formula
Wrap your paste
calls with as.formula
selection(as.formula(paste("y_prob", "~", paste(x_vars[1:4], collapse = " + "))),
as.formula(paste("y", "~", paste(x_vars[3:5], collapse = " + "))), data)
Call:
selection(selection = as.formula(paste("y_prob", "~", paste(x_vars[1:4], collapse = " + "))), outcome = as.formula(paste("y", "~", paste(x_vars[3:5], collapse = " + "))), data = data)
Coefficients:
S:(Intercept) S:x1 S:x2 S:x3 S:x4 O:(Intercept) O:x3 O:x4 O:x5 sigma
-1.936e-01 -5.851e-05 7.020e-05 5.475e-05 2.811e-05 2.905e+02 2.286e-01 2.437e-01 2.165e-01 4.083e+02
rho
1.000e+00
How to make lm interpret eval in formula
Do not use eval(parse())
until you are an advanced R user (and then you usually won't need it). Just use as.formula
:
lm(as.formula(paste0("Y ~ ", XT2)), data=Test)
Note that a better strategy for your goal would be:
lm(Y ~ ., data=Test[, c("Y", "X1", "X2")])
Adding extra variables to a formula
Use reformulate and update like this:
update(part_A, reformulate(c(".", part_B)))
## y ~ x1 + x2 + x3
This also works:
v <- all.vars(part_A)
reformulate(c(v[-1], part_B), v[1])
## y ~ x1 + x2 + x3
Showing string in formula and not as variable in lm fit
How about eval(call("lm", sformula))
?
lm(sformula)
#Call:
#lm(formula = sformula)
eval(call("lm", sformula))
#Call:
#lm(formula = "y~x")
Generally speaking there is a data
argument for lm
. Let's do:
mydata <- data.frame(y = y, x = x)
eval(call("lm", sformula, quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)
The above call()
+ eval()
combination can be replaced by do.call()
:
do.call("lm", list(formula = sformula))
#Call:
#lm(formula = "y~x")
do.call("lm", list(formula = sformula, data = quote(mydata)))
#Call:
#lm(formula = "y~x", data = mydata)
Related Topics
Plot Correlation Matrix into a Graph
R Stacked Percentage Bar Plot With Percentage of Binary Factor and Labels (With Ggplot)
Creating a Comma Separated Vector
Extract the First 2 Characters in a String
Removing Empty Rows of a Data File in R
Add a Variable to a Data Frame Containing Max Value of Each Row
How to Change the Y-Axis Figures into Percentages in a Barplot
How to Order the Fill-Colours Within Ggplot2 Geom_Bar
Create New Variables With Mutate_At While Keeping the Original Ones
Custom Sorting (Non-Alphabetical)
Locate the ".Rprofile" File Generating Default Options
Subset Rows in a Data Frame Based on a Vector of Values
How to Print When Using %Dopar%
Convert Comma Separated String to Numeric Columns
Create New Dummy Variable Columns from Categorical Variable
Check Existence of Directory and Create If Doesn't Exist
Finding Percentage in a Sub-Group Using Group_By and Summarise