Rcpparmadillo Pass User-Defined Function

RcppArmadillo pass user-defined function

(Sometime you need to use svn log ... on files to see how dated they are...)

I think a better use case is in my "port" of the C-based DEoptim to Rcpp / RcppArmadillo: RcppDE. In it, I allow the optimization routine to use either an R function (as DEoptim does) or a user-supplied compiled function -- which is what you want here as I understand it.

There is a tiny bit of C++ scaffolding, but you should have no problem following that.

Edit on 2013-01-21 Below is a complete solution which I have also justed posted as this new post at the Rcpp Gallery -- including some comments and sample usage.

// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>

using namespace arma; 
using namespace Rcpp;

vec fun1_cpp(const vec& x) {    // a first function 
    vec y = x + x;
    return (y);
}

vec fun2_cpp(const vec& x) {    // and a second function
    vec y = 10*x;
    return (y);
}

typedef vec (*funcPtr)(const vec& x);

// [[Rcpp::export]]
XPtr<funcPtr> putFunPtrInXPtr(std::string fstr) {
    if (fstr == "fun1")
        return(XPtr<funcPtr>(new funcPtr(&fun1_cpp)));
    else if (fstr == "fun2")
        return(XPtr<funcPtr>(new funcPtr(&fun2_cpp)));
    else
        return XPtr<funcPtr>(R_NilValue); // runtime error as NULL no XPtr
}

// [[Rcpp::export]]
vec callViaString(const vec x, std::string funname) {
    XPtr<funcPtr> xpfun = putFunPtrInXPtr(funname);
    funcPtr fun = *xpfun;
    vec y = fun(x);
    return (y);
}

// [[Rcpp::export]]
vec callViaXPtr(const vec x, SEXP xpsexp) {
    XPtr<funcPtr> xpfun(xpsexp);
    funcPtr fun = *xpfun;
    vec y = fun(x);
    return (y);
}

function pass by reference in RcppArmadillo

A double is not a native R type (so there is always a copy being made) and no pass-through reference is possible.

Instead, use Rcpp::NumericVector which is a proxy for a SEXP type. This works:

R> sourceCpp("/tmp/so44047145.cpp")

R> x = 1.0  

R> myfun(x) 
Inside myfun: x = 0.0361444

R> x        
[1] 0.0361444
R>

Below is the full code with another small repair or two:

#include <RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

//[[Rcpp::export]]
void myfun(Rcpp::NumericVector &x){
  arma::mat X = arma::randu<arma::mat>(5,5);
  arma::mat Y = X.t()*X;
  arma::mat R1 = chol(Y);

  x[0] = arma::det(R1);
  Rcpp::Rcout << "Inside myfun: x = " << x << std::endl;
}


/*** R
x = 1.0  // initialize x 
myfun(x) // update x to a new value calculated internally
x        // return the new x; it should be different from 1
*/

using a user defined function in Rcpp (cppFunction)

I would suggest the following:

Study our documentation and examples. We show how to pass functions around too, even if we do not recommend it (for obvious performance reason, calling R from C++ ain't speedy).
If you somewhat complex example does not work, try a smaller one. At the end of the day you may just want a tester which receives two numbers and passes those to a supplied function.
And lastly: You really want blacksch in C++ too. All the statistical functions are available under the same names.

Use a C++ function as an argument for another C++ function called by an exported Rcpp function

Here's a complete example using the approach Ralf wrote in #1. You can use pure C/C++ function pointers here, although you can do more complex things with C++11.

#include<RcppArmadillo.h>

// [[Rcpp::depends(RcppArmadillo)]]

typedef arma::mat (*functype)(arma::mat&);

arma::mat f1(arma::mat& a){
  return a+1;
}

arma::mat f2(functype g, arma::mat a){
  return g(a);
}

//[[Rcpp::export]]
arma::mat f3(arma::mat a){
  return f2(f1, a);
}

R side:

> f3(matrix(1))
     [,1]
[1,]    2

Why does my RcppParallel implementation of a user-defined function crash unexpectedly?

So following Dirk's suggestion I am posting an answer with a pared back example to illustrate the problem I had and the solution I arrived at with his help.

The mistake I made was actually in how I treated the begin and end variables within my worker. In contrast to the articles in the RcppParallel gallery, I was not using begin/end to guide iterators to the relevant portions of the calculation, but rather trying to use them to index the relevant part of my input dataset for each portion.

This caused dimension errors, which on my machine simply crashed the R session.

The solution to this mistake would be to either (1) ensure any UDFs you are applying deal in iterators rather than vector values or (2) to bridge the begin/end variables correctly to the vectors you are trying to index.

Given that all of my modelling functions are already in the business of taking vector indices, I have applied the second approach and create a unique_indices vector within my function which the begin/end values can simply select values from. The current solution makes some assumptions about how the input indices will work (i.e. simply integer values from smallest to largest in the argument vector).

Apologies if this is still considered verbose, but I thought it worth keeping the data-handling logic as it was in the problem statement because that is where the problem arose. That is where a submatrix is identified by an index and used as the arguments to some calculation. The key differences to the example above are on lines 48-52 and 62-65

Where (1) each i between begin and end is used to select an index as so int index_value = unique_indices[i] ; which then identifies the relevant input data and (2) the unique_indices vector is defined by the characteristics of the vector of indices vec_ind

// [[Rcpp::depends(RcppArmadillo, RcppParallel)]]
#include <string>
#include <algorithm>
#include <vector>
#include <math.h>
#include <RcppArmadillo.h>
#include <RcppParallel.h>
using namespace RcppParallel;

//[[Rcpp::export]]

std::vector<double> allwhich_ts(std::vector<double> vector, double value){
  int length = vector.size() ;
  std::vector<double> values(length) ;
  int matches = 0;
  for(int i = 0; i < length; i++){
    bool match = vector[i] == value;
    if(match){values[matches] = i;
      matches++ ;}}
  std::vector<double> op(values.begin(), values.begin() + matches) ;
  return(op);
}

struct vector_double_worker : public Worker {
  // Defining worker arguments
  const RVector<double> vector1 ;
  const RVector<double> vector_indices ;
  const RVector<double> unique_indices ;
  const int vector_length ;
  RVector<double> output_vec ;
  // Initialising function argument values
  vector_double_worker(
    const Rcpp::NumericVector& vector1, const Rcpp::NumericVector& vector_indices,
    const Rcpp::NumericVector& unique_indices, const int& vector_length, Rcpp::NumericVector& output_vec
  ) : vector1(vector1),vector_indices(vector_indices),unique_indices(unique_indices),
      vector_length(vector_length),output_vec(output_vec) {}
  // Setting up conversion function so that UDFs can deal in std:: types
  std::vector<double> convert_input_vec(RVector<double> input_vector, int vec_length){
    std::vector<double> input_vector_ts(input_vector.begin(), input_vector.end()) ;
    return(input_vector_ts) ;}
  // Defining operator ranges which will breakdown the task into partitions
  void operator()(std::size_t begin, std::size_t end){
  // Converting input vectors to std types
    std::vector<double> vector1_ts = convert_input_vec(vector1, vector_length) ;
    std::vector<double> vector_indices_ts = convert_input_vec(vector_indices, vector_length) ;
  // For loop to perform calculations for each element in a given partition
    for(unsigned int i = begin; i < end; i++){
      int index_value = unique_indices[i] ; // begin and end now used to index the vector of input indices defined outside of the function
      std::vector<double> indices = allwhich_ts(vector_indices_ts, index_value) ; // identifying sub-vector indices
      int values_begin = indices.at(0) ;
      int values_end = indices.at(std::distance(indices.begin(), indices.end()) - 1) ; // - 1 was added to avoid dimension error
      std::vector<double> values1(vector1_ts.begin() + values_begin, vector1_ts.begin() + values_end + 1) ; // + 1 was added to avoid dimension error
      int op_size = values1.size() ;
      for(int n = 0; n < op_size; n++){output_vec[i*op_size + n] = values1[n] * 2 ;} // Trivial example calculation
    }}};

//[[Rcpp::export]]

Rcpp::NumericVector vector_double_parallel(Rcpp::NumericVector vec1, Rcpp::NumericVector vec_ind){
  int vec_length = vec1.size() ; // Setting up output vector
  Rcpp::NumericVector op_vec(vec_length);
  double n_indices = *std::max_element(vec_ind.begin(), vec_ind.end()) ; // Identifying unique index values
  double min_indices = *std::min_element(vec_ind.begin(), vec_ind.end()) ;
  Rcpp::NumericVector unique_indices(n_indices) ;
  std::iota(unique_indices.begin(), unique_indices.end(), min_indices);
  vector_double_worker vec_2_worker(vec1,vec_ind,unique_indices,vec_length,op_vec) ; // Setting up parallel worker
  parallelFor(0, n_indices, vec_2_worker) ; // Populating output vector with results
  return(op_vec) ;}

Rcpp function to construct a function

Ok, as far as I understand, you want a function returning function with a closure, a.k.a. " the function defined in the closure 'remembers' the environment in which it was created."

In C++11 and up it is quite possible to define such function, along the lines

std::function<double(double)> createax2Rcpp(int a) {
    auto ax2 = [a](double x) {  return(double(a) * pow(x, 2)); };
    return ax2;
}

What happens, the anonymous class and object with overloaded operator() will be created, it will capture the closure and moved out of the creator function. Return will be captured into instance of std::function with type erasure etc.

But! C/C++ function in R requires to be of a certain type, which is narrower (as an opposite to wider, you could capture narrow objects into wide one, but not vice versa).

Thus, I don't know how to make from std::function a proper R function, looks like it is impossible.

Perhaps, emulation of the closure like below might help

static int __a;

double ax2(double x) {
    return(__a * pow(x, 2));
}

Rcpp::Function createax2Rcpp(int a) {
    __a = a;

    return (ax2);
}

Rcpparmadillo Pass User-Defined Function