Fast Large Matrix Multiplication in R

fast large matrix multiplication in R

There are many ways to approach this depending upon your code, effort, and hardware.

Use the 'best' function for the job

The simplest is to use crossprod which is the same as t(a)%*% b (Note - this will only be a small increase in speed)

crossprod(a,b)

Use Rcpp (and likely RcppEigen/RcppArmadillo).

C++ will likely greater increase the speed of your code. Using linear algebra libraries will likely also help this further (hence Eigen and Armadillo). This however assumes you are willing to write some C++.

Use a better backend

After this point you are looking at BLAS backends such as OpenBLAS, Atlas, etc. Hooking these up to R varies depending upon your OS. It is quite easy if you are using a Debian system like Ubuntu. You can find a demo here. These can sometimes be leveraged further by libraries such as Armadillo and Eigen.

GPU Computing

If you have a GPU (e.g. AMD, NVIDIA, etc.) you can leverage the many cores within to greatly speed up your computations. There are a few that could be useful including gpuR, gputools, and gmatrix

EDIT - to address @jenesaiquoi comment on benefit of Rcpp

test.cpp

// [[Rcpp::depends(RcppArmadillo, RcppEigen)]]

#include <RcppArmadillo.h>
#include <RcppEigen.h>

// [[Rcpp::export]]
SEXP armaMatMult(arma::mat A, arma::mat B){
    arma::mat C = A * B;

    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMatMult(Eigen::MatrixXd A, Eigen::MatrixXd B){
    Eigen::MatrixXd C = A * B;

    return Rcpp::wrap(C);
}

// [[Rcpp::export]]
SEXP eigenMapMatMult(const Eigen::Map<Eigen::MatrixXd> A, Eigen::Map<Eigen::MatrixXd> B){
    Eigen::MatrixXd C = A * B;

    return Rcpp::wrap(C);
}

test.R

library(Rcpp)

A <- matrix(rnorm(10000), 100, 100)
B <- matrix(rnorm(10000), 100, 100)

library(microbenchmark)
sourceCpp("test.cpp")
microbenchmark(A%*%B, armaMatMult(A, B), eigenMatMult(A, B), eigenMapMatMult(A, B))

Unit: microseconds
                  expr     min       lq     mean   median       uq      max neval
               A %*% B 885.846 892.1035 933.7457 901.1010 938.9255 1411.647   100
     armaMatMult(A, B) 846.688 857.6320 915.0717 866.2265 893.7790 1421.557   100
    eigenMatMult(A, B) 205.978 208.1295 233.1882 217.0310 229.4730  369.369   100
 eigenMapMatMult(A, B) 192.366 194.9835 207.1035 197.5405 205.2550  366.945   100

Matrix multiplication speeds in R as fast as in Python?

I use MRO and python with anaconda and the MKL BLAS. Here are my results for the same data generating process, i.e. np.random.rand ('float64') or rnorm and identical dimensions (average and standard deviation over 10 replications ):

Python:

np.dot(A, B) # 1.3616 s (sd = 0.1776)

Bt = t(B)
a = A %*% B # 2.0285 s (sd = 0.1897)
acp = tcrossprod(A, Bt) # 1.3098 s (sd = 0.1206)
identical(acp, a) # TRUE

Why is this naive matrix multiplication faster than base R's?

A quick glance in names.c (here in particular) points you to do_matprod, the C function that is called by %*% and which is found in the file array.c. (Interestingly, it turns out, that both crossprod and tcrossprod dispatch to that same function as well). Here is a link to the code of do_matprod.

Scrolling through the function, you can see that it takes care of a number of things your naive implementation does not, including:

Keeps row and column names, where that makes sense.
Allows for dispatch to alternative S4 methods when the two objects being operated on by a call to %*% are of classes for which such methods have been provided. (That's what's happening in this portion of the function.)
Handles both real and complex matrices.
Implements a series of rules for how to handle multiplication of a matrix and a matrix, a vector and a matrix, a matrix and a vector, and a vector and a vector. (Recall that under cross-multiplication in R, a vector on the LHS is treated as a row vector, whereas on the RHS, it is treated as a column vector; this is the code that makes that so.)

Near the end of the function, it dispatches to either of matprod or or cmatprod. Interestingly (to me at least), in the case of real matrices, if either matrix might contain NaN or Inf values, then matprod dispatches (here) to a function called simple_matprod which is about as simple and straightforward as your own. Otherwise, it dispatches to one of a couple of BLAS Fortran routines which, presumably are faster, if uniformly 'well-behaved' matrix elements can be guaranteed.

Fastest way for multiplying a matrix to a vector

sweep seems to run a bit faster on my machine

sweep(mat, 2, v, FUN = "*")

Some benchmarks:

> microbenchmark(mat %*% diag(v),sweep(mat, 2, v, FUN = "*"))

Unit: milliseconds
  expr       min        lq   median        uq      max neval
   %*% 214.66700 226.95551 231.2366 255.78493 349.1911   100
 sweep  42.42987  44.72254  62.9990  70.87403 127.2869   100

Speed up sparse matrix multiplication in R

You can create a sparse matrix using the Matrix package. Matrix/vector multiplication may be faster in this case. For example:

library(Matrix)
library(tictoc)
set.seed(123)
v <- sample(1e4)
m  <- Matrix(sample(c(0, 1), length(v) ^ 2, T, c(.99, .01)),
         length(v), length(v), sparse = F)
sm <- Matrix(m, sparse = T)
tic("dense")
x <- m %*% v
toc()
#> dense: 0.094 sec elapsed
tic("sparse")
y <- sm %*% v
toc()
#> sparse: 0.006 sec elapsed

Fastest way to hadamard multiply all matrix columns with another matrix

You can try apply(A, 2, '*', B) and to come the the same like colmat_prod use array(apply(A, 2, '*', B), c(dim(B), ncol(A))):

identical(array(apply(A, 2, '*', B), c(dim(B), ncol(A))), colmat_prod(A, B))
#[1] TRUE

Another option is to use rep for the columns of A:

array(A[,rep(seq_len(ncol(A)), each=ncol(B))] * as.vector(B), c(dim(B), ncol(A)))

Timings:

library(microbenchmark)
microbenchmark(colmat_prod(A1, B1), 
               colmat_prod_vec(A1, B1),
               array(apply(A1, 2, '*', B1), c(dim(B1), ncol(A1))),
               array(A1[,rep(seq_len(ncol(A1)), each=ncol(B1))] * as.vector(B1), c(dim(B1), ncol(A1))),
               times = 10)
#Unit: milliseconds
#                                                                                            expr      min        lq      mean    median        uq       max neval  cld
#                                                                             colmat_prod(A1, B1) 831.5437  857.0305  910.5694  878.6842  999.5354 1025.0915    10   c 
#                                                                         colmat_prod_vec(A1, B1) 981.9241 1010.9482 1174.1700 1162.7004 1319.3478 1444.6158    10    d
#                                              array(apply(A1, 2, "*", B1), c(dim(B1), ncol(A1))) 716.1469  725.7862  765.4987  732.2520  789.3843  907.4417    10  b  
# array(A1[, rep(seq_len(ncol(A1)), each = ncol(B1))] * as.vector(B1),      c(dim(B1), ncol(A1))) 404.8460  406.2848  430.4043  428.2685  458.9400  462.0634    10 a

Fast Large Matrix Multiplication in R