Model matrix with all pairwise interactions between columns
If you mean in a model formula, then the ^
operator does this.
## dummy data
set.seed(1)
dat <- data.frame(Y = rnorm(10), x = rnorm(10), y = rnorm(10), z = rnorm(10))
The formula is
form <- Y ~ (x + y + z)^2
which gives (using model.matrix()
- which is used internally by the standard model fitting functions)
model.matrix(form, data = dat)
R> form <- Y ~ (x + y + z)^2
R> form
Y ~ (x + y + z)^2
R> model.matrix(form, data = dat)
(Intercept) x y z x:y x:z y:z
1 1 1.51178 0.91898 1.35868 1.389293 2.054026 1.24860
2 1 0.38984 0.78214 -0.10279 0.304911 -0.040071 -0.08039
3 1 -0.62124 0.07456 0.38767 -0.046323 -0.240837 0.02891
4 1 -2.21470 -1.98935 -0.05381 4.405817 0.119162 0.10704
5 1 1.12493 0.61983 -1.37706 0.697261 -1.549097 -0.85354
6 1 -0.04493 -0.05613 -0.41499 0.002522 0.018647 0.02329
7 1 -0.01619 -0.15580 -0.39429 0.002522 0.006384 0.06143
8 1 0.94384 -1.47075 -0.05931 -1.388149 -0.055982 0.08724
9 1 0.82122 -0.47815 1.10003 -0.392667 0.903364 -0.52598
10 1 0.59390 0.41794 0.76318 0.248216 0.453251 0.31896
attr(,"assign")
[1] 0 1 2 3 4 5 6
If you don't know how many variables you have, or it is tedious to write out all of them, use the .
notation too
R> form <- Y ~ .^2
R> model.matrix(form, data = dat)
(Intercept) x y z x:y x:z y:z
1 1 1.51178 0.91898 1.35868 1.389293 2.054026 1.24860
2 1 0.38984 0.78214 -0.10279 0.304911 -0.040071 -0.08039
3 1 -0.62124 0.07456 0.38767 -0.046323 -0.240837 0.02891
4 1 -2.21470 -1.98935 -0.05381 4.405817 0.119162 0.10704
5 1 1.12493 0.61983 -1.37706 0.697261 -1.549097 -0.85354
6 1 -0.04493 -0.05613 -0.41499 0.002522 0.018647 0.02329
7 1 -0.01619 -0.15580 -0.39429 0.002522 0.006384 0.06143
8 1 0.94384 -1.47075 -0.05931 -1.388149 -0.055982 0.08724
9 1 0.82122 -0.47815 1.10003 -0.392667 0.903364 -0.52598
10 1 0.59390 0.41794 0.76318 0.248216 0.453251 0.31896
attr(,"assign")
[1] 0 1 2 3 4 5 6
The "power" in the ^
operator, here 2
, controls the order of interactions. With ^2
we get second order interactions of all pairs of variables considered by the ^
operator. If you want up to 3rd-order interactions, then use ^3
.
R> form <- Y ~ .^3
R> head(model.matrix(form, data = dat))
(Intercept) x y z x:y x:z y:z x:y:z
1 1 1.51178 0.91898 1.35868 1.389293 2.05403 1.24860 1.887604
2 1 0.38984 0.78214 -0.10279 0.304911 -0.04007 -0.08039 -0.031341
3 1 -0.62124 0.07456 0.38767 -0.046323 -0.24084 0.02891 -0.017958
4 1 -2.21470 -1.98935 -0.05381 4.405817 0.11916 0.10704 -0.237055
5 1 1.12493 0.61983 -1.37706 0.697261 -1.54910 -0.85354 -0.960170
6 1 -0.04493 -0.05613 -0.41499 0.002522 0.01865 0.02329 -0.001047
How to make all interactions before using glmnet
Yes, there is a convenient way for that. Two steps in it are important.
library(glmnet)
# Sample data
data <- data.frame(matrix(rnorm(9 * 10), ncol = 9))
names(data) <- c(paste0("x", 1:8), "y")
# First step: using .*. for all interactions
f <- as.formula(y ~ .*.)
y <- data$y
# Second step: using model.matrix to take advantage of f
x <- model.matrix(f, data)[, -1]
glmnet(x, y)
Create matrix using pairwise calculations between columns in R
The outer
function will do this and keep track of the bookkeeping for you, but you have to pass it a vectorized function.
summin <- Vectorize(function(i, j) sum(pmin(ps[[i]], ps[[j]])))
outer(seq_len(ncol(ps)), seq_len(ncol(ps)), FUN=summin)
## [,1] [,2]
## [1,] 1.01 0.98
## [2,] 0.98 1.00
I have no idea what's supposed to going on in your v1
code, it doesn't look like you're summing the minimums anymore.
If I was going to loop myself, I'd use expand.grid
instead of combn, as then I get the diagonals and don't have to figure out how to populate the two sides of the matrix, though at the expense of doing all the computations twice. (The computer can do it twice faster than I can figure out how to ask it to do only once, anyway.) I'd also just make it as a vector and then convert to a matrix afterwards.
cc <- expand.grid(seq_len(ncol(d)), seq_len(ncol(d)))
out <- sapply(seq_len(nrow(cc)), function(k) {
i <- cc[k,1]
j <- cc[k,2]
sum(pmin(d[[i]],d[[j]]))
})
out <- matrix(out, ncol=ncol(d))
Performing pairways interactions between all fields using recipes
I'm not sure if it's a perfect (or even good) solution, but I used the answer here to find the columns that contained NA
s and then removed them wholesale.
So the bit after parsed_recipe
was switched to this:
interim_train <- bake(parsed_recipe, new_data = training(partitions))
columns_to_remove <- colnames(interim_train)[colSums(is.na(interim_train)) > 0]
train_data <- interim_train %>%
select(-columns_to_remove)
summary(train_data)
test_data <- bake(parsed_recipe, new_data = testing(partitions)) %>%
select(-columns_to_remove)
Thus far it seems to be behaving in a more promising fashion.
R generate all possible interaction variables
What do you plan to do with all these interaction terms? There are several options, which is best will depend on what you are trying to do.
If you want to pass the interactions to a modeling function like lm
or aov
then it is very simple, just use the .^2
syntax:
fit <- lm( y ~ .^2, data=mydf )
The above will call lm
and tell it to fit all the main effects and all 2 way interaction for the variables in mydf
excluding y
.
If for some reason you really want to calculate all the interactions then you can use model.matrix
:
tmp <- model.matrix( ~.^2, data=iris)
This will include a column for the intercept and columns for the main effects, but you can drop those if you don't want them.
If you need something different from the modeling then you can use the combn
function as @akrun mentions in the comments.
Compute stepwise regresion with all the pairwise interactions possible between variable
Just replace the +
by a *
:
Step <- train(Y~ P*T*A, data=df,
preProcess= c("center", "scale"),
method = "lmStepAIC",
trainControl(method="cv",repeats = 10), na.rm=T)
Step: AIC=575.39
.outcome ~ P + `T:A`
Df Sum of Sq RSS AIC
<none> 6807627 575.39
- P 1 2094075 8901703 586.27
- `T:A` 1 7886150 14693778 610.32
EDIT
If you don't want A:T:P to be tested then use :
Step <- train(Y~ (P+T+A)^2, data=df,
preProcess= c("center", "scale"),
method = "lmStepAIC",
trainControl(method="cv",repeats = 10), na.rm=T)
the ^2
selects only two-terms interactions
Add all possible two-way interactions between two sets of variables - R
The formula interface lets you do that easily with the ^
-operator where you could construct all the 2way interactions from two factor variables by (ethnicity + incgrp)^2
, but that only applies if you use the R factor conventions. It appears you are attempting to circumvent the proper use of formulas and factors by instead doing SAS-style dummy variable creation. For your situation, you might try:
glm(death ~ age + (black + hisp + other)*( rich + middle), family = binomial("probit"), data=data)
The formula
interpretation uses both ^
and *
to construct interactions. They loose their conventional mathematical meaning. See ?formula
How to create a pairwise matrix with counts of matching entries for comparisons of all levels of one factor in a dataframe?
You could do it this way:
x = xtabs(~PLOT+INTERACTION,d)
INTERACTION
PLOT interact_type_1 interact_type_2 interact_type_3 interact_type_4
A 1 1 0 0
B 0 0 1 1
C 1 0 0 0
D 0 0 0 1
E 1 1 1 1
Find the combinations of two among PLOT
using combn
:
n = length(unique(d$PLOT))
c = combn(1:n,2)
Then construct your matrix and fill its lower half:
m = matrix(nrow=n,ncol=n)
## for each possible combination of two present in c, we find for the corresponding rows in x how many 1s they have in common using sum(x[y[1],]*x[y[2],])
m[lower.tri(m)] = apply(c,2,function(y) sum(x[y[1],]*x[y[2],]))
This returns:
[,1] [,2] [,3] [,4] [,5]
[1,] NA NA NA NA NA
[2,] 0 NA NA NA NA
[3,] 1 0 NA NA NA
[4,] 0 1 0 NA NA
[5,] 2 2 1 1 NA
regarding building regression models including interaction effects in lm
You can specify the highest order of interactions with ^
.
y ~ (x[,1] + x[,2] + x[,3]) ^ 2
results in all two-variable interactions and main effects.
Related Topics
R: How to Select Files in Directory Which Satisfy Conditions Both on the Beginning and End of Name
S3 Method Consistency Warning When Building R Package with Roxygen
Function Commenting Conventions in R
Converting Date to a Day of Week in R
How to Split a Character Vector into Data Frame
What Are Productive Ways to Debug Rcpp Compiled Code Loaded in R (On Os X Mavericks)
Passing a 'Data.Table' to C++ Functions Using 'Rcpp' And/Or 'Rcpparmadillo'
How Can a Script Find Itself in R Running from the Command Line
S4 Classes: Multiple Types Per Slot
Reshape Data from Long to Wide, with Time in New Wide Variable Name
Error Installing Packages from Github
Convert a Vector into a List, Each Element in the Vector as an Element in the List
Arrange_() Multiple Columns with Descending Order
R: Split Elements of a List into Sublists
Index Element from List in Rcpp
How to Test If Object Is a Vector
Different Results with Randomforest() and Caret's Randomforest (Method = "Rf")