What is R's crossproduct function?
According to the help function in R: crossprod (X,Y) = t(X)%*% Y is a faster implementation than the expression itself. It is a function of two matrices, and if you have two vectors corresponds to the dot product. @Hong-Ooi's comments explains why it is called crossproduct.
R - Compute Cross Product of Vectors (Physics)
Here is a generalized cross product:
xprod <- function(...) {
args <- list(...)
# Check for valid arguments
if (length(args) == 0) {
stop("No data supplied")
}
len <- unique(sapply(args, FUN=length))
if (length(len) > 1) {
stop("All vectors must be the same length")
}
if (len != length(args) + 1) {
stop("Must supply N-1 vectors of length N")
}
# Compute generalized cross product by taking the determinant of sub-matricies
m <- do.call(rbind, args)
sapply(seq(len),
FUN=function(i) {
det(m[,-i,drop=FALSE]) * (-1)^(i+1)
})
}
For your example:
> xprod(1:3, 4:6)
[1] -3 6 -3
This works for any dimension:
> xprod(c(0,1)) # 2d
[1] 1 0
> xprod(c(1,0,0), c(0,1,0)) # 3d
[1] 0 0 1
> xprod(c(1,0,0,0), c(0,1,0,0), c(0,0,1,0)) # 4d
[1] 0 0 0 -1
See https://en.wikipedia.org/wiki/Cross_product
crossprod() in R syntax
Don't get too hung-up on the crossprod
in the paired
function. All it does is to check x[x] - 1:length(x)
is a zero vector (i.e. the condition for "perfect pairing"). It could be coded differently and faster (see paired2
or paired3
):
> n <- 8
> set.seed(17)
> x <- replicate(1e6, sample(1:n, n))
>
> paired <- function(x) crossprod(x[x] - 1:length(x))==0
> paired2 <- function(x) sum(x[x]==1:length(x))==length(x)
> paired3 <- function(x) sum(abs(x[x]-1:length(x)))==0
>
> system.time(i.paired <- apply(x, 2, paired))
user system elapsed
9.812 0.000 9.821
> system.time(i.paired2 <- apply(x, 2, paired2))
user system elapsed
4.548 0.000 4.550
> system.time(i.paired3 <- apply(x, 2, paired3))
user system elapsed
4.617 0.000 4.617
>
> all.equal(i.paired,i.paired2)
[1] TRUE
> all.equal(i.paired,i.paired3)
[1] TRUE
Efficient way to calculate cross products, multiply matrix and sum it up
Each element Z[i,j]
can be written as a bilinear form. The rest is: puttig together all similar calculations for the matrix Z
.
You can do:
Z <- t(X) %*% Y %*% X ### or
Z <- crossprod(X, Y) %*% X
To compare this calculation with your code:
set.seed(42)
n <- 27
X <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)
Y <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)
Z <- matrix(nrow=n, ncol=n)
for (i in 1:ncol(X)) {
for (j in 1:ncol(X)) {
cp <- tcrossprod(X[,i], X[,j])
Z[i,j] <- sum(cp * Y)
}
}
Z2 <- t(X) %*% Y %*% X
Z3 <- crossprod(X, Y) %*% X
sum(abs(Z2-Z))
sum(abs(Z3-Z))
If L
is the list of your 13 matrices X. you can do:
lapply(L, function(X) crossprod(X, Y) %*% X)
Here is the benchmarking:
Z1 <- function(X) {
Z <- matrix(nrow=27, ncol=27)
for (i in 1:ncol(X)) {
for (j in 1:ncol(X)) {
cp <- tcrossprod(X[,i], X[,j])
Z[i,j] <- sum(cp * Y)
}
}
return(Z)
}
library("microbenchmark")
microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#> microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# Z1 3563.167 3671.6355 4391.00888 3721.3380 3874.617 9423.808 100 b
# Z2 26.558 27.3420 34.31214 35.5865 39.815 56.426 100 a
# Z3 24.779 25.1675 27.43546 26.0965 28.034 47.268 100 a
The solutions from Ronak are not faster than the original code, i.e. they are loop-hiding:
fun <- function(x, y) sum(tcrossprod(X[,x], X[,y]) *Y)
microbenchmark(Z1=Z1(X),
R1=outer(seq_len(ncol(X)), seq_len(ncol(X)), Vectorize(fun)),
R2=t(sapply(seq_len(ncol(X)), function(x)
sapply(seq_len(ncol(X)), function(y) sum(tcrossprod(X[,x], X[,y]) *Y)))),
R3=t(apply(X, 2, function(x) apply(X, 2, function(y) sum(tcrossprod(x, y) *Y)))),
unit="relative")
# Unit: relative
# expr min lq mean median uq max neval cld
# Z1 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 a
# R1 1.207583 1.213846 1.195597 1.216147 1.223139 1.060187 100 ab
# R2 1.225521 1.230332 1.487811 1.230852 1.299253 13.140022 100 b
# R3 1.156546 1.158774 1.217766 1.160142 2.012623 1.098679 100 ab
Is there a way to apply a crossproduct to a new column in R using mutate?
You could do:
dataframe %>% mutate(d = as.matrix(.) %*% nums)
#> a b c d
#> 1 1 0 1 400
#> 2 1 1 1 600
#> 3 1 0 0 100
or
dataframe %>% mutate(d = crossprod(t(as.matrix(.)), nums))
#> a b c d
#> 1 1 0 1 400
#> 2 1 1 1 600
#> 3 1 0 0 100
R language cross-product combination of two string arrays
You can try
c(outer(a, b, FUN=paste0))
#[1] "2013-03-31" "2014-03-31" "2015-03-31" "2013-06-30" "2014-06-30"
#[6] "2015-06-30"
Or
do.call(paste0,expand.grid(a,b))
Or
sprintf('%s%s', rep(a, length(b)), rep(b, length(a)))
applying cross product (kronecker) many times
The lapply
gives a list of the ith row of the matrices and Reduce
kroneckers them together. The sapply
then assembles that into the final matrix.
N <- nrow(list0[[1]])
sapply(1:N, function(i) Reduce("%x%", init = 1, lapply(rev(list0), "[", i, TRUE)))
How to do a matrix calculation to get the cross products of variables
I'm still new at R, but here's a stab at it anyway. For fun! I have no idea if there's any hope for it to be fast. Probably it's quite naive...
First an example matrix x
of num_observations x num_features
of small random integers.
num_features <- 5
num_observations <- 20
features <- letters[1:num_features]
x <- replicate(num_features, sample(1:10, num_observations, replace = T))
colnames(x) <- features
All combinations of feature pairs:
combinations <- combn(features, 2)
num_combinations = ncol(combinations)
For each feature pair, we'll multiply the corresponding columns in x
. Reserving space for a new matrix where the multiplied columns will end up:
y <- matrix(NA, ncol = num_combinations, nrow = num_observations)
cn <- rep("?", num_combinations) # column names of new features
Multiplying the column combinations:
for (i in 1:num_combinations)
{
cn[i] <- paste(combinations[1,i], combinations[2,i], sep = ".")
y[,i] <- x[,combinations[1,i]] * x[,combinations[2,i]]
}
colnames(y) <- cn
Finally merging the original matrix and the additional features:
x <- cbind(x, y)
This only handles multiplication for simplicity, additional features created using division is of course similar.
UPDATE
A nice approach suggested by @nongkrong in the comments forgoes the explicit loop and simply does:
y <- combn(split(x, col(x)), 2, FUN = function(cols) cols[[1]] * cols[[2]])
x <- cbind(x, y)
It doesn't explicitly set the column names of the new features, but it is more elegant and more readable. In some quick timings I did it was also about 30% faster!
Cartesian product data frame
You can use expand.grid(A, B, C)
EDIT: an alternative to using do.call
to achieve the second part, is the function mdply
from the package plyr
:
library(plyr)
d = expand.grid(x = A, y = B, z = C)
d = mdply(d, f)
To illustrate its usage using a trivial function 'paste', you can try
d = mdply(d, 'paste', sep = '+');
Related Topics
How to Set Memory Limit in Rstudio (Desktop Version)
Plotting Functions on Top of Datapoints in R
Voronoi Diagram Polygons Enclosed in Geographic Borders
R Pheatmap: Change Annotation Colors and Prevent Graphics Window from Popping Up
Can Lapply Not Modify Variables in a Higher Scope
Find the Index of the Column in Data Frame That Contains the String as Value
Converting Date to a Day of Week in R
Porting Set Operations from R's Data Frames to Data Tables: How to Identify Duplicated Rows
R Stacked Bar Graph Plotting Geom_Text
Provide Shades Between Dates on X Axis
Why Are Lubridate Functions So Slow When Compared with As.Posixct
How to Check If a Vector Contains N Consecutive Numbers
R Subsetting a Data Frame into Multiple Data Frames Based on Multiple Column Values