Model Matrix with All Pairwise Interactions Between Columns

Model matrix with all pairwise interactions between columns

If you mean in a model formula, then the ^ operator does this.

## dummy data
set.seed(1)
dat <- data.frame(Y = rnorm(10), x = rnorm(10), y = rnorm(10), z = rnorm(10))

The formula is

form <- Y ~ (x + y + z)^2

which gives (using model.matrix() - which is used internally by the standard model fitting functions)

model.matrix(form, data = dat)

R> form <- Y ~ (x + y + z)^2
R> form
Y ~ (x + y + z)^2
R> model.matrix(form, data = dat)
   (Intercept)        x        y        z       x:y       x:z      y:z
1            1  1.51178  0.91898  1.35868  1.389293  2.054026  1.24860
2            1  0.38984  0.78214 -0.10279  0.304911 -0.040071 -0.08039
3            1 -0.62124  0.07456  0.38767 -0.046323 -0.240837  0.02891
4            1 -2.21470 -1.98935 -0.05381  4.405817  0.119162  0.10704
5            1  1.12493  0.61983 -1.37706  0.697261 -1.549097 -0.85354
6            1 -0.04493 -0.05613 -0.41499  0.002522  0.018647  0.02329
7            1 -0.01619 -0.15580 -0.39429  0.002522  0.006384  0.06143
8            1  0.94384 -1.47075 -0.05931 -1.388149 -0.055982  0.08724
9            1  0.82122 -0.47815  1.10003 -0.392667  0.903364 -0.52598
10           1  0.59390  0.41794  0.76318  0.248216  0.453251  0.31896
attr(,"assign")
[1] 0 1 2 3 4 5 6

If you don't know how many variables you have, or it is tedious to write out all of them, use the . notation too

R> form <- Y ~ .^2
R> model.matrix(form, data = dat)
   (Intercept)        x        y        z       x:y       x:z      y:z
1            1  1.51178  0.91898  1.35868  1.389293  2.054026  1.24860
2            1  0.38984  0.78214 -0.10279  0.304911 -0.040071 -0.08039
3            1 -0.62124  0.07456  0.38767 -0.046323 -0.240837  0.02891
4            1 -2.21470 -1.98935 -0.05381  4.405817  0.119162  0.10704
5            1  1.12493  0.61983 -1.37706  0.697261 -1.549097 -0.85354
6            1 -0.04493 -0.05613 -0.41499  0.002522  0.018647  0.02329
7            1 -0.01619 -0.15580 -0.39429  0.002522  0.006384  0.06143
8            1  0.94384 -1.47075 -0.05931 -1.388149 -0.055982  0.08724
9            1  0.82122 -0.47815  1.10003 -0.392667  0.903364 -0.52598
10           1  0.59390  0.41794  0.76318  0.248216  0.453251  0.31896
attr(,"assign")
[1] 0 1 2 3 4 5 6

The "power" in the ^ operator, here 2, controls the order of interactions. With ^2 we get second order interactions of all pairs of variables considered by the ^ operator. If you want up to 3rd-order interactions, then use ^3.

R> form <- Y ~ .^3
R> head(model.matrix(form, data = dat))
  (Intercept)        x        y        z       x:y      x:z      y:z     x:y:z
1           1  1.51178  0.91898  1.35868  1.389293  2.05403  1.24860  1.887604
2           1  0.38984  0.78214 -0.10279  0.304911 -0.04007 -0.08039 -0.031341
3           1 -0.62124  0.07456  0.38767 -0.046323 -0.24084  0.02891 -0.017958
4           1 -2.21470 -1.98935 -0.05381  4.405817  0.11916  0.10704 -0.237055
5           1  1.12493  0.61983 -1.37706  0.697261 -1.54910 -0.85354 -0.960170
6           1 -0.04493 -0.05613 -0.41499  0.002522  0.01865  0.02329 -0.001047

How to make all interactions before using glmnet

Yes, there is a convenient way for that. Two steps in it are important.

library(glmnet)
# Sample data
data <- data.frame(matrix(rnorm(9 * 10), ncol = 9))
names(data) <- c(paste0("x", 1:8), "y")
# First step: using .*. for all interactions
f <- as.formula(y ~ .*.)
y <- data$y
# Second step: using model.matrix to take advantage of f
x <- model.matrix(f, data)[, -1]
glmnet(x, y)

Create matrix using pairwise calculations between columns in R

The outer function will do this and keep track of the bookkeeping for you, but you have to pass it a vectorized function.

summin <- Vectorize(function(i, j) sum(pmin(ps[[i]], ps[[j]])))
outer(seq_len(ncol(ps)), seq_len(ncol(ps)), FUN=summin)
##      [,1] [,2]
## [1,] 1.01 0.98
## [2,] 0.98 1.00

I have no idea what's supposed to going on in your v1 code, it doesn't look like you're summing the minimums anymore.

If I was going to loop myself, I'd use expand.grid instead of combn, as then I get the diagonals and don't have to figure out how to populate the two sides of the matrix, though at the expense of doing all the computations twice. (The computer can do it twice faster than I can figure out how to ask it to do only once, anyway.) I'd also just make it as a vector and then convert to a matrix afterwards.

cc <- expand.grid(seq_len(ncol(d)), seq_len(ncol(d)))
out <- sapply(seq_len(nrow(cc)), function(k) {
    i <- cc[k,1]
    j <- cc[k,2]
    sum(pmin(d[[i]],d[[j]]))
})
out <- matrix(out, ncol=ncol(d))

Performing pairways interactions between all fields using recipes

I'm not sure if it's a perfect (or even good) solution, but I used the answer here to find the columns that contained NAs and then removed them wholesale.

So the bit after parsed_recipe was switched to this:

interim_train <- bake(parsed_recipe, new_data = training(partitions))

columns_to_remove <- colnames(interim_train)[colSums(is.na(interim_train)) > 0]

train_data <- interim_train %>%
  select(-columns_to_remove)

summary(train_data)

test_data <- bake(parsed_recipe, new_data = testing(partitions)) %>%
  select(-columns_to_remove)

Thus far it seems to be behaving in a more promising fashion.

R generate all possible interaction variables

What do you plan to do with all these interaction terms? There are several options, which is best will depend on what you are trying to do.

If you want to pass the interactions to a modeling function like lm or aov then it is very simple, just use the .^2 syntax:

fit <- lm( y ~ .^2, data=mydf )

The above will call lm and tell it to fit all the main effects and all 2 way interaction for the variables in mydf excluding y.

If for some reason you really want to calculate all the interactions then you can use model.matrix:

tmp <- model.matrix( ~.^2, data=iris)

This will include a column for the intercept and columns for the main effects, but you can drop those if you don't want them.

If you need something different from the modeling then you can use the combn function as @akrun mentions in the comments.

Compute stepwise regresion with all the pairwise interactions possible between variable

Just replace the +by a *:

Step <- train(Y~ P*T*A, data=df,
          preProcess= c("center", "scale"),
          method = "lmStepAIC",
          trainControl(method="cv",repeats = 10), na.rm=T)

Step:  AIC=575.39
.outcome ~ P + `T:A`

        Df Sum of Sq      RSS    AIC
<none>                6807627 575.39
- P      1   2094075  8901703 586.27
- `T:A`  1   7886150 14693778 610.32

EDIT

If you don't want A:T:P to be tested then use :

Step <- train(Y~ (P+T+A)^2, data=df,
          preProcess= c("center", "scale"),
          method = "lmStepAIC",
          trainControl(method="cv",repeats = 10), na.rm=T)

the ^2selects only two-terms interactions

Add all possible two-way interactions between two sets of variables - R

The formula interface lets you do that easily with the ^-operator where you could construct all the 2way interactions from two factor variables by (ethnicity + incgrp)^2 , but that only applies if you use the R factor conventions. It appears you are attempting to circumvent the proper use of formulas and factors by instead doing SAS-style dummy variable creation. For your situation, you might try:

glm(death ~ age + (black + hisp + other)*( rich + middle), family = binomial("probit"), data=data)

The formula interpretation uses both ^ and * to construct interactions. They loose their conventional mathematical meaning. See ?formula

How to create a pairwise matrix with counts of matching entries for comparisons of all levels of one factor in a dataframe?

You could do it this way:

x = xtabs(~PLOT+INTERACTION,d)
        INTERACTION
    PLOT interact_type_1 interact_type_2 interact_type_3 interact_type_4
       A               1               1               0               0
       B               0               0               1               1
       C               1               0               0               0
       D               0               0               0               1
       E               1               1               1               1

Find the combinations of two among PLOT using combn:

n = length(unique(d$PLOT))
c = combn(1:n,2)

Then construct your matrix and fill its lower half:

m = matrix(nrow=n,ncol=n)
## for each possible combination of two present in c, we find for the corresponding rows in x how many 1s they have in common using sum(x[y[1],]*x[y[2],])
m[lower.tri(m)] = apply(c,2,function(y) sum(x[y[1],]*x[y[2],]))

This returns:

      [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA   NA   NA   NA
[2,]    0   NA   NA   NA   NA
[3,]    1    0   NA   NA   NA
[4,]    0    1    0   NA   NA
[5,]    2    2    1    1   NA

regarding building regression models including interaction effects in lm

You can specify the highest order of interactions with ^.

y ~ (x[,1] + x[,2] + x[,3]) ^ 2

results in all two-variable interactions and main effects.

Model Matrix with All Pairwise Interactions Between Columns