Passing Large Matrices to Rcpparmadillo Function Without Creating Copy (Advanced Constructors)

Passing large matrices to RcppArmadillo function without creating copy (advanced constructors)

This has been discussed extensively in the Rcpp mailing list. See this thread. The solution that has been implemented in RcppArmadillo is to pass the arma::mat by reference. Internally this will call the advanced constructor for you.

So with this version, you would do something like this:

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::export]]
arma::vec test(const arma::mat& M) {
    // do whatever with M
    ...
}

And the data from the R matrix is not copied but rather borrowed. More details in the thread.

Here are some benchmarks comparing the time it takes to copy or pass by reference:

                 expr      min        lq    median        uq      max neval
    arma_test_value(m) 3540.369 3554.4105 3572.3305 3592.5795 4168.671   100
      arma_test_ref(m)    4.046    4.3205    4.7770   15.5855   16.671   100
arma_test_const_ref(m)    3.994    4.3660    5.5125   15.7355   34.874   100

With these functions:

#include <RcppArmadillo.h>
using namespace Rcpp ;

// [[Rcpp::depends("RcppArmadillo")]]

// [[Rcpp::export]]
void arma_test_value( arma::mat x){}

// [[Rcpp::export]]
void arma_test_ref( arma::mat& x){}

// [[Rcpp::export]]
void arma_test_const_ref( const arma::mat& x){}

Large Matrices in RcppArmadillo via the ARMA_64BIT_WORD define

RcppArmadillo is an Rcpp-only package due to the contents in the /src directory. To enable C++11 use // [[Rcpp::plugins(cpp11)]]
ARMA_64BIT_WORD is not defined. To define it add #define ARMA_64BIT_WORD 1 prior to the #include <RcppArmadillo.h>.

Sample implementation using sourceCpp()

#define ARMA_64BIT_WORD 1
#include <RcppArmadillo.h>
// [[Rcpp::depends(RcppArmadillo)]]
// [[Rcpp::plugins(cpp11)]] 

// [[Rcpp::export]] 
arma::mat cosCpp(const arma::mat& X) {

    arma::mat Y = arma::trans(X) * X; // matrix product
    arma::mat res = Y / (arma::sqrt(arma::diagvec(Y)) * arma::trans(arma::sqrt(arma::diagvec(Y))));

    return res;
}

To define it within the /src/Makevars{.win} for a package use:

PKG_CPPFLAGS = -DARMA_64BIT_WORD=1

convert from C array (column-major) to armadillo matrix (arma::mat) without copying

From the documentation of armadillo:
(http://arma.sourceforge.net/docs.html#Mat)

Advanced constructors
mat(ptr_aux_mem, n_rows, n_cols, copy_aux_mem = true, strict = false)  
Create a matrix using data from writable auxiliary (external) memory, where ptr_aux_mem is a pointer to the memory. By default the matrix allocates its own memory and copies data from the auxiliary memory (for safety). However, if copy_aux_mem is set to false, the matrix will instead directly use the auxiliary memory (ie. no copying); this is faster, but can be dangerous unless you know what you're doing!

The strict parameter comes into effect only when copy_aux_mem is set to false (ie. the matrix is directly using auxiliary memory)

when strict is set to false, the matrix will use the auxiliary memory until a size change

when strict is set to true, the matrix will be bound to the auxiliary memory for its lifetime; the number of elements in the matrix can't be changed

the default setting of strict in versions 6.000+ is false

the default setting of strict in versions 5.600 and earlier is true

Deciding between NumericVector and arma::vec in Rcpp

What are the key differences between those two?

The *Vector and *Matrix classes in Rcpp act as wrappers for R's SEXP representation, e.g. an S expression that is as a pointer to the data. For details, please see Section 1.1 SEXPs of R Internals.Rcpp's design leverages this by constructing C++ objects from classes that enclose the pointer to the data. This promotes two key features:

Seamless transference between R and C++ objects, and
Low transference cost between R and C++ as only a pointer is passed.
- as the data isn't copied but referenced

Meanwhile, arma objects are akin to a traditional std::vector<T> in the way that a deep copy occurs between the R and C++ objects. There is one exception to this statement, the presence of the advanced constructor, which allows for the memory behind the R object to be reused inside of an armadillo object's structure. Thus, if you are not careful, you may incur an unnecessary penalty during the transition from R to C++ and vice versa.

Note: The advanced constructor that allows the reuse of memory does not exist for arma::sp_mat. Thus, using references with sparse matrices will likely not yield the desired speed up as a copy is performed from R to C++ and back.

You can view the differences based largely on the "pass-by-reference" or "pass-by-copy" paradigm. To understand the difference outside of code, consider the following GIF by mathwarehouse:

To illustrate this scenario in code, consider the following three functions:

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
void memory_reference_double_ex(arma::vec& x, double value) {
    x.fill(value);
}

// [[Rcpp::export]]
void memory_reference_int_ex(arma::ivec& x, int value) {
    x.fill(value);
}

// [[Rcpp::export]]
arma::vec memory_copy_ex(arma::vec x, int value) {
    x.fill(value);
    return x;
}

The two functions memory_reference_double_ex() and memory_reference_int_ex() will update the object inside of R assuming the appropriate data type is present. As a result, we are able to avoid returning a value by specifying void in their definitions since the memory allocated by x is being reused. The third function, memory_copy_ex() requires a return type since it passes-by-copy and, thus, does not modify the existing storage without a reassignment call.

To emphasize:

The x vector will be passed into C++ by reference, e.g. & on the end of arma::vec& or arma::ivec&, and
The class of x in R is either double or integer meaning we are matching the underlying type of arma::vec, e.g Col<double>, or arma::ivec, e.g. Col<int>.

Let's quickly take a look at two examples.

Within the first example, we will look at the results from running memory_reference_double_ex() and compare it to the results generated by memory_copy_ex(). Note, the types between the objected defined in R and C++ are the same (e.g. double). In the next example, this will not hold.

x = c(0.1, 2.3, 4.8, 9.1)
typeof(x)
# [1] "double"

x
# [1] 0.1 2.3 4.8 9.1

# Nothing is returned...
memory_reference_double_ex(x, value = 9)

x
# [1] 9 9 9 9

a = memory_copy_ex(x, value = 3)

x
# [1] 9 9 9 9

a
#      [,1]
# [1,]    3
# [2,]    3
# [3,]    3
# [4,]    3

Now, what happens if the underlying type of the R object is an integer instead of a double?

x = c(1L, 2L, 3L, 4L)
typeof(x)
# [1] "integer"

x
# [1] 1 2 3 4

# Return nothing...
memory_reference_double_ex(x, value = 9)

x
# [1] 1 2 3 4

What happened? Why didn't x get updated? Well, behind the scenes Rcpp created a new memory allocation that was the proper type -- double and not int -- before passing it onto armadillo. This caused the reference "linkage" between the two objects to differ.

If we change to using an integer data type in the armadillo vector, notice we now have the same effect given previously:

memory_reference_int_ex(x, value = 3)

x
# [1] 3 3 3 3

This leads to discussion on the usefulness of these two paradigms. As speed is the preferred benchmark when working with C++, let's view this in terms of a benchmark.

Consider the following two functions:

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
void copy_double_ex(arma::vec x, double value) {
    x.fill(value);
}

// [[Rcpp::export]]
void reference_double_ex(arma::vec& x, double value) {
    x.fill(value);
}

Running a microbenchmark over them yields:

# install.packages("microbenchmark")
library("microbenchmark")

x = rep(1, 1e8)

micro_timings = microbenchmark(copy_double_ex(x, value = 9.0),
                               reference_double_ex(x, value = 9.0))
autoplot(micro_timings)
micro_timings

# Unit: milliseconds
#                               expr       min        lq      mean    median        uq      max neval
#       copy_double_ex(x, value = 9) 523.55708 529.23219 547.22669 536.71177 555.00069 640.5020   100
#  reference_double_ex(x, value = 9)  79.78624  80.70757  88.67695  82.44711  85.73199 308.4219   100

Sample Image

Note: The referenced object is ~ 6.509771 times faster per iteration than the copied paradigm as we do not have to reallocate and fill that memory.

When to use which?

What do you need to do?

Are you just trying to quickly speed up an algorithm that relies on a loop but lacks the need for rigorous linear algebra manipulation?

If so, just using Rcpp should suffice.

Are you trying to perform linear algebra manipulations?
Or are you hoping to use this code across multiple libraries or computational platforms (e.g. MATLAB, Python, R, ...)?

If so, you should be writing the crux of the algorithm in armadillo and setting up the appropriate hooks to export the functions into R with Rcpp.

Is there a performance/memory advantage of using one over the other?

Yes, as indicated previously, there is definitely a performance / memory advantage. Not only that, but by using RcppArmadillo you are effectively adding an additional library ontop of Rcpp and, thus, increasing the overall installation footprint, compile time, and system requirements (see the woes of macOS builds). Figure out what is required by your project and then opt for that structure.

Are the only difference the member functions?

Not only member functions, but:

estimation routines in terms of matrix decomposition
computing statistical quantity values
object generation
sparse representation (avoid manipulating an S4 object)

These are fundamental differences between Rcpp and armadillo. One is meant to facilitate transference of R objects into C++ whereas the other is meant for more rigorous linear algebra computations. This should be largely evident as Rcpp does not implement any matrix multiplication logic whereas armadillo uses the system's Basic Linear Algebra Subprograms (BLAS) to perform the computation.

And, as a bonus question: should I even consider arma::colvec or arma::rowvec?

Depends on how you want the result to be returned. Do you want to have an: 1 x N (row vector) or N x 1 (column vector)? RcppArmadillo by default returns these structures as matrix objects with their appropriate dimensions and not a traditional 1D R vector.

As an example:

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

// [[Rcpp::export]]
arma::vec col_example(int n) {
    arma::vec x = arma::randu<arma::vec>(n);
    return x;
}

// [[Rcpp::export]]
arma::rowvec row_example(int n) {
    arma::rowvec x = arma::randu<arma::rowvec>(n);
    return x;
}

Test:

set.seed(1)
col_example(4)
#           [,1]
# [1,] 0.2655087
# [2,] 0.3721239
# [3,] 0.5728534
# [4,] 0.9082078

set.seed(1)
row_example(4)
#           [,1]      [,2]      [,3]      [,4]
# [1,] 0.2655087 0.3721239 0.5728534 0.9082078

Passing Large Matrices to Rcpparmadillo Function Without Creating Copy (Advanced Constructors)