What Is R's Crossproduct Function

What is R's crossproduct function?

According to the help function in R: crossprod (X,Y) = t(X)%*% Y is a faster implementation than the expression itself. It is a function of two matrices, and if you have two vectors corresponds to the dot product. @Hong-Ooi's comments explains why it is called crossproduct.

R - Compute Cross Product of Vectors (Physics)

Here is a generalized cross product:

xprod <- function(...) {
  args <- list(...)

  # Check for valid arguments

  if (length(args) == 0) {
    stop("No data supplied")
  }
  len <- unique(sapply(args, FUN=length))
  if (length(len) > 1) {
    stop("All vectors must be the same length")
  }
  if (len != length(args) + 1) {
    stop("Must supply N-1 vectors of length N")
  }

  # Compute generalized cross product by taking the determinant of sub-matricies

  m <- do.call(rbind, args)
  sapply(seq(len),
         FUN=function(i) {
           det(m[,-i,drop=FALSE]) * (-1)^(i+1)
         })
}

For your example:

> xprod(1:3, 4:6)
[1] -3  6 -3

This works for any dimension:

> xprod(c(0,1)) # 2d
[1] 1 0
> xprod(c(1,0,0), c(0,1,0)) # 3d
[1] 0 0 1
> xprod(c(1,0,0,0), c(0,1,0,0), c(0,0,1,0)) # 4d
[1]  0  0  0 -1

See https://en.wikipedia.org/wiki/Cross_product

crossprod() in R syntax

Don't get too hung-up on the crossprod in the paired function. All it does is to check x[x] - 1:length(x) is a zero vector (i.e. the condition for "perfect pairing"). It could be coded differently and faster (see paired2 or paired3):

> n <- 8
> set.seed(17)
> x <- replicate(1e6, sample(1:n, n))
>
> paired <- function(x) crossprod(x[x] - 1:length(x))==0
> paired2 <- function(x) sum(x[x]==1:length(x))==length(x)
> paired3 <- function(x) sum(abs(x[x]-1:length(x)))==0
>
> system.time(i.paired  <- apply(x, 2, paired))
   user  system elapsed
  9.812   0.000   9.821
> system.time(i.paired2 <- apply(x, 2, paired2))
   user  system elapsed
  4.548   0.000   4.550
> system.time(i.paired3 <- apply(x, 2, paired3))
   user  system elapsed
  4.617   0.000   4.617
>
> all.equal(i.paired,i.paired2)
[1] TRUE
> all.equal(i.paired,i.paired3)
[1] TRUE

Efficient way to calculate cross products, multiply matrix and sum it up

Each element Z[i,j] can be written as a bilinear form. The rest is: puttig together all similar calculations for the matrix Z.

You can do:

Z <- t(X) %*% Y %*% X  ### or
Z <- crossprod(X, Y) %*% X

To compare this calculation with your code:

set.seed(42)
n <- 27
X <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)
Y <- matrix(runif(n*n, 0, 1), nrow=n, ncol=n)

Z <- matrix(nrow=n, ncol=n)
for (i in 1:ncol(X)) {
  for (j in 1:ncol(X)) {
    cp     <- tcrossprod(X[,i], X[,j])
    Z[i,j] <- sum(cp * Y)
  }
}

Z2 <- t(X) %*% Y %*% X
Z3 <- crossprod(X, Y) %*% X
sum(abs(Z2-Z))
sum(abs(Z3-Z))

If L is the list of your 13 matrices X. you can do:

lapply(L, function(X) crossprod(X, Y) %*% X)

Here is the benchmarking:

Z1 <- function(X) {
  Z <- matrix(nrow=27, ncol=27)
  for (i in 1:ncol(X)) {
    for (j in 1:ncol(X)) {
      cp     <- tcrossprod(X[,i], X[,j])
      Z[i,j] <- sum(cp * Y)
    }
  }
  return(Z)
}

library("microbenchmark")

microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#> microbenchmark(Z1=Z1(X), Z2=t(X) %*% Y %*% X, Z3=crossprod(X, Y) %*% X)
#Unit: microseconds
# expr      min        lq       mean    median       uq      max neval cld
#   Z1 3563.167 3671.6355 4391.00888 3721.3380 3874.617 9423.808   100   b
#   Z2   26.558   27.3420   34.31214   35.5865   39.815   56.426   100  a 
#   Z3   24.779   25.1675   27.43546   26.0965   28.034   47.268   100  a

The solutions from Ronak are not faster than the original code, i.e. they are loop-hiding:

fun <- function(x, y) sum(tcrossprod(X[,x], X[,y]) *Y)

microbenchmark(Z1=Z1(X), 
               R1=outer(seq_len(ncol(X)), seq_len(ncol(X)), Vectorize(fun)), 
               R2=t(sapply(seq_len(ncol(X)), function(x) 
                 sapply(seq_len(ncol(X)), function(y)  sum(tcrossprod(X[,x], X[,y]) *Y)))),
               R3=t(apply(X, 2, function(x) apply(X, 2, function(y) sum(tcrossprod(x, y) *Y)))),
               unit="relative")
# Unit: relative
# expr      min       lq     mean   median       uq       max neval cld
#   Z1 1.000000 1.000000 1.000000 1.000000 1.000000  1.000000   100  a 
#   R1 1.207583 1.213846 1.195597 1.216147 1.223139  1.060187   100  ab
#   R2 1.225521 1.230332 1.487811 1.230852 1.299253 13.140022   100   b
#   R3 1.156546 1.158774 1.217766 1.160142 2.012623  1.098679   100  ab

Is there a way to apply a crossproduct to a new column in R using mutate?

You could do:

dataframe %>% mutate(d = as.matrix(.) %*% nums)

#>   a b c   d
#> 1 1 0 1 400
#> 2 1 1 1 600
#> 3 1 0 0 100

dataframe %>% mutate(d = crossprod(t(as.matrix(.)), nums))

#>   a b c   d
#> 1 1 0 1 400
#> 2 1 1 1 600
#> 3 1 0 0 100

R language cross-product combination of two string arrays

You can try

c(outer(a, b, FUN=paste0))
#[1] "2013-03-31" "2014-03-31" "2015-03-31" "2013-06-30" "2014-06-30"
#[6] "2015-06-30"

do.call(paste0,expand.grid(a,b))

sprintf('%s%s', rep(a, length(b)), rep(b, length(a)))

applying cross product (kronecker) many times

The lapply gives a list of the ith row of the matrices and Reduce kroneckers them together. The sapply then assembles that into the final matrix.

N <- nrow(list0[[1]])
sapply(1:N, function(i) Reduce("%x%", init = 1, lapply(rev(list0), "[", i, TRUE)))

How to do a matrix calculation to get the cross products of variables

I'm still new at R, but here's a stab at it anyway. For fun! I have no idea if there's any hope for it to be fast. Probably it's quite naive...

First an example matrix x of num_observations x num_features of small random integers.

num_features <- 5
num_observations <- 20
features <- letters[1:num_features]

x <- replicate(num_features, sample(1:10, num_observations, replace = T))

colnames(x) <- features

All combinations of feature pairs:

combinations <- combn(features, 2)
num_combinations = ncol(combinations)

For each feature pair, we'll multiply the corresponding columns in x. Reserving space for a new matrix where the multiplied columns will end up:

y <- matrix(NA, ncol = num_combinations, nrow = num_observations)
cn <- rep("?", num_combinations) # column names of new features

Multiplying the column combinations:

for (i in 1:num_combinations)
{
  cn[i] <- paste(combinations[1,i], combinations[2,i], sep = ".")
  y[,i] <- x[,combinations[1,i]] * x[,combinations[2,i]]
}
colnames(y) <- cn

Finally merging the original matrix and the additional features:

x <- cbind(x, y)

This only handles multiplication for simplicity, additional features created using division is of course similar.

UPDATE

A nice approach suggested by @nongkrong in the comments forgoes the explicit loop and simply does:

y <- combn(split(x, col(x)), 2, FUN = function(cols) cols[[1]] * cols[[2]])
x <- cbind(x, y)

It doesn't explicitly set the column names of the new features, but it is more elegant and more readable. In some quick timings I did it was also about 30% faster!

Cartesian product data frame

You can use expand.grid(A, B, C)

EDIT: an alternative to using do.call to achieve the second part, is the function mdply from the package plyr:

library(plyr)

d = expand.grid(x = A, y = B, z = C)
d = mdply(d, f)

To illustrate its usage using a trivial function 'paste', you can try

d = mdply(d, 'paste', sep = '+');

What Is R's Crossproduct Function