How to automatically include all 2-way interactions in a glm model in R
You can do two-way interactions simply using .*.
and arbitrary n-way interactions writing .^n
. formula(g)
will tell you the expanded version of the formula in each of these cases.
Looping thru variables in a dataframe to create interactions
You don't need to create an explicit interaction column for each pair of variables. Instead Col1 * Col2
in a model formula will generate the interactions automatically. For example, if your outcome variable is y
(which would be a column in your data frame), and you want a regression formula with all two-way interactions between the other columns, you could do:
form = reformulate(apply(combn(names(df)[-grep("y", names(df))], 2), 2, paste, collapse="*"), "y")
form
y ~ Col1 * Col2 + Col1 * Col3 + Col2 * Col3
Then your regression model would be:
mod = lm(form, data=df)
How to make all interactions before using glmnet
Yes, there is a convenient way for that. Two steps in it are important.
library(glmnet)
# Sample data
data <- data.frame(matrix(rnorm(9 * 10), ncol = 9))
names(data) <- c(paste0("x", 1:8), "y")
# First step: using .*. for all interactions
f <- as.formula(y ~ .*.)
y <- data$y
# Second step: using model.matrix to take advantage of f
x <- model.matrix(f, data)[, -1]
glmnet(x, y)
Performing pairways interactions between all fields using recipes
I'm not sure if it's a perfect (or even good) solution, but I used the answer here to find the columns that contained NA
s and then removed them wholesale.
So the bit after parsed_recipe
was switched to this:
interim_train <- bake(parsed_recipe, new_data = training(partitions))
columns_to_remove <- colnames(interim_train)[colSums(is.na(interim_train)) > 0]
train_data <- interim_train %>%
select(-columns_to_remove)
summary(train_data)
test_data <- bake(parsed_recipe, new_data = testing(partitions)) %>%
select(-columns_to_remove)
Thus far it seems to be behaving in a more promising fashion.
Calculate all possible interactions in model_matrix
You can use use as.formula
with paste
in model_matrix
:
model_matrix(data, as.formula(paste0("~ .^", ndim)))
Related Topics
R Install Package Loaded Namespace
How to Do Conditional Grouping of Data in R
Convert Comma Separated String to Integer in R
R: How to Draw a Line with Multiple Arrows in It
Geom_Col Is Assigning the Wrong Independent Variable
Fitting Linear Model/Anova by Group
Grouping & Visualizing Cumulative Features in R
Force Error Bars to Be in the Middle of Bar
Combining Pivoted Rows in R by Common Value
Regex Match Exact Number of a Specific Character
How to Find Index of Match Between Two Set of Data Frame