What Is R's Crossproduct Function

What is R's crossproduct function?

According to the help function in R: crossprod (X,Y) = t(X)%*% Y is a faster implementation than the expression itself. It is a function of two matrices, and if you have two vectors corresponds to the dot product. @Hong-Ooi's comments explains why it is called crossproduct.

R - Compute Cross Product of Vectors (Physics)

Here is a generalized cross product:

xprod <- function(...) {
args <- list(...)

# Check for valid arguments

if (length(args) == 0) {
stop("No data supplied")
}
len <- unique(sapply(args, FUN=length))
if (length(len) > 1) {
stop("All vectors must be the same length")
}
if (len != length(args) + 1) {
stop("Must supply N-1 vectors of length N")
}

# Compute generalized cross product by taking the determinant of sub-matricies

m <- do.call(rbind, args)
sapply(seq(len),
FUN=function(i) {
det(m[,-i,drop=FALSE]) * (-1)^(i+1)
})
}

For your example:

> xprod(1:3, 4:6)
[1] -3 6 -3

This works for any dimension:

> xprod(c(0,1)) # 2d
[1] 1 0
> xprod(c(1,0,0), c(0,1,0)) # 3d
[1] 0 0 1
> xprod(c(1,0,0,0), c(0,1,0,0), c(0,0,1,0)) # 4d
[1] 0 0 0 -1

See https://en.wikipedia.org/wiki/Cross_product

crossprod() in R syntax

Don't get too hung-up on the crossprod in the paired function. All it does is to check x[x] - 1:length(x) is a zero vector (i.e. the condition for "perfect pairing"). It could be coded differently and faster (see paired2 or paired3):

> n <- 8
> set.seed(17)
> x <- replicate(1e6, sample(1:n, n))
>
> paired <- function(x) crossprod(x[x] - 1:length(x))==0
> paired2 <- function(x) sum(x[x]==1:length(x))==length(x)
> paired3 <- function(x) sum(abs(x[x]-1:length(x)))==0
>
> system.time(i.paired <- apply(x, 2, paired))
user system elapsed
9.812 0.000 9.821
> system.time(i.paired2 <- apply(x, 2, paired2))
user system elapsed
4.548 0.000 4.550
> system.time(i.paired3 <- apply(x, 2, paired3))
user system elapsed
4.617 0.000 4.617
>
> all.equal(i.paired,i.paired2)
[1] TRUE
> all.equal(i.paired,i.paired3)
[1] TRUE

Efficient way to calculate cross products, multiply matrix and sum it up

Each element Z[i,j] can be written as a bilinear form. The rest is: puttig together all similar calculations for the matrix Z.

You can do:

Z <- t(X) %*% Y %*% X  ### or
Z <- crossprod(X, Y) %*% X

To compare this calculation with your code:

set.seed(42)
n <- 27
X <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)
Y <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)

Z <- matrix(nrow=n, ncol=n)
for (i in 1:ncol(X)) {
for (j in 1:ncol(X)) {
cp <- tcrossprod(X[,i], X[,j])
Z[i,j] <- sum(cp * Y)
}
}

Z2 <- t(X) %*% Y %*% X
Z3 <- crossprod(X, Y) %*% X
sum(abs(Z2-Z))
sum(abs(Z3-Z))

If L is the list of your 13 matrices X. you can do:

lapply(L, function(X) crossprod(X, Y) %*% X)

Here is the benchmarking:

Z1 <- function(X) {
Z <- matrix(nrow=27, ncol=27)
for (i in 1:ncol(X)) {
for (j in 1:ncol(X)) {
cp <- tcrossprod(X[,i], X[,j])
Z[i,j] <- sum(cp * Y)
}
}
return(Z)
}

library("microbenchmark")

microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#> microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#Unit: microseconds
# expr min lq mean median uq max neval cld
# Z1 3563.167 3671.6355 4391.00888 3721.3380 3874.617 9423.808 100 b
# Z2 26.558 27.3420 34.31214 35.5865 39.815 56.426 100 a
# Z3 24.779 25.1675 27.43546 26.0965 28.034 47.268 100 a

The solutions from Ronak are not faster than the original code, i.e. they are loop-hiding:

fun <- function(x, y) sum(tcrossprod(X[,x], X[,y]) *Y)

microbenchmark(Z1=Z1(X),
R1=outer(seq_len(ncol(X)), seq_len(ncol(X)), Vectorize(fun)),
R2=t(sapply(seq_len(ncol(X)), function(x)
sapply(seq_len(ncol(X)), function(y) sum(tcrossprod(X[,x], X[,y]) *Y)))),
R3=t(apply(X, 2, function(x) apply(X, 2, function(y) sum(tcrossprod(x, y) *Y)))),
unit="relative")
# Unit: relative
# expr min lq mean median uq max neval cld
# Z1 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 100 a
# R1 1.207583 1.213846 1.195597 1.216147 1.223139 1.060187 100 ab
# R2 1.225521 1.230332 1.487811 1.230852 1.299253 13.140022 100 b
# R3 1.156546 1.158774 1.217766 1.160142 2.012623 1.098679 100 ab

Is there a way to apply a crossproduct to a new column in R using mutate?

You could do:

dataframe %>% mutate(d = as.matrix(.) %*% nums)

#> a b c d
#> 1 1 0 1 400
#> 2 1 1 1 600
#> 3 1 0 0 100

or

dataframe %>% mutate(d = crossprod(t(as.matrix(.)), nums))

#> a b c d
#> 1 1 0 1 400
#> 2 1 1 1 600
#> 3 1 0 0 100

R language cross-product combination of two string arrays

You can try

c(outer(a, b, FUN=paste0))
#[1] "2013-03-31" "2014-03-31" "2015-03-31" "2013-06-30" "2014-06-30"
#[6] "2015-06-30"

Or

do.call(paste0,expand.grid(a,b))

Or

sprintf('%s%s', rep(a, length(b)), rep(b, length(a)))

applying cross product (kronecker) many times

The lapply gives a list of the ith row of the matrices and Reduce kroneckers them together. The sapply then assembles that into the final matrix.

N <- nrow(list0[[1]])
sapply(1:N, function(i) Reduce("%x%", init = 1, lapply(rev(list0), "[", i, TRUE)))

How to do a matrix calculation to get the cross products of variables

I'm still new at R, but here's a stab at it anyway. For fun! I have no idea if there's any hope for it to be fast. Probably it's quite naive...

First an example matrix x of num_observations x num_features of small random integers.

num_features <- 5
num_observations <- 20
features <- letters[1:num_features]

x <- replicate(num_features, sample(1:10, num_observations, replace = T))

colnames(x) <- features

All combinations of feature pairs:

combinations <- combn(features, 2)
num_combinations = ncol(combinations)

For each feature pair, we'll multiply the corresponding columns in x. Reserving space for a new matrix where the multiplied columns will end up:

y <- matrix(NA, ncol = num_combinations, nrow = num_observations)
cn <- rep("?", num_combinations) # column names of new features

Multiplying the column combinations:

for (i in 1:num_combinations)
{
cn[i] <- paste(combinations[1,i], combinations[2,i], sep = ".")
y[,i] <- x[,combinations[1,i]] * x[,combinations[2,i]]
}
colnames(y) <- cn

Finally merging the original matrix and the additional features:

x <- cbind(x, y)

This only handles multiplication for simplicity, additional features created using division is of course similar.

UPDATE

A nice approach suggested by @nongkrong in the comments forgoes the explicit loop and simply does:

y <- combn(split(x, col(x)), 2, FUN = function(cols) cols[[1]] * cols[[2]])
x <- cbind(x, y)

It doesn't explicitly set the column names of the new features, but it is more elegant and more readable. In some quick timings I did it was also about 30% faster!

Cartesian product data frame

You can use expand.grid(A, B, C)


EDIT: an alternative to using do.call to achieve the second part, is the function mdply from the package plyr:

library(plyr)

d = expand.grid(x = A, y = B, z = C)
d = mdply(d, f)

To illustrate its usage using a trivial function 'paste', you can try

d = mdply(d, 'paste', sep = '+');


Related Topics



Leave a reply



Submit