How to Automatically Include All 2-Way Interactions in a Glm Model in R

How to automatically include all 2-way interactions in a glm model in R

You can do two-way interactions simply using .*. and arbitrary n-way interactions writing .^n. formula(g) will tell you the expanded version of the formula in each of these cases.

Looping thru variables in a dataframe to create interactions

You don't need to create an explicit interaction column for each pair of variables. Instead Col1 * Col2 in a model formula will generate the interactions automatically. For example, if your outcome variable is y (which would be a column in your data frame), and you want a regression formula with all two-way interactions between the other columns, you could do:

form = reformulate(apply(combn(names(df)[-grep("y", names(df))], 2), 2, paste, collapse="*"), "y")

form
y ~ Col1 * Col2 + Col1 * Col3 + Col2 * Col3

Then your regression model would be:

mod = lm(form, data=df)

How to make all interactions before using glmnet

Yes, there is a convenient way for that. Two steps in it are important.

library(glmnet)
# Sample data
data <- data.frame(matrix(rnorm(9 * 10), ncol = 9))
names(data) <- c(paste0("x", 1:8), "y")
# First step: using .*. for all interactions
f <- as.formula(y ~ .*.)
y <- data$y
# Second step: using model.matrix to take advantage of f
x <- model.matrix(f, data)[, -1]
glmnet(x, y)

Performing pairways interactions between all fields using recipes

I'm not sure if it's a perfect (or even good) solution, but I used the answer here to find the columns that contained NAs and then removed them wholesale.

So the bit after parsed_recipe was switched to this:

interim_train <- bake(parsed_recipe, new_data = training(partitions))

columns_to_remove <- colnames(interim_train)[colSums(is.na(interim_train)) > 0]

train_data <- interim_train %>%
select(-columns_to_remove)

summary(train_data)

test_data <- bake(parsed_recipe, new_data = testing(partitions)) %>%
select(-columns_to_remove)

Thus far it seems to be behaving in a more promising fashion.

Calculate all possible interactions in model_matrix

You can use use as.formula with paste in model_matrix:

model_matrix(data, as.formula(paste0("~ .^", ndim)))


Related Topics



Leave a reply



Submit