Passing a `data.table` to c++ functions using `Rcpp` and/or `RcppArmadillo`
Building on top of other answers, here is some example code:
#include <Rcpp.h>
using namespace Rcpp ;
// [[Rcpp::export]]
double do_stuff_with_a_data_table(DataFrame df){
CharacterVector x = df["x"] ;
NumericVector y = df["y"] ;
IntegerVector z = df["v"] ;
/* do whatever with x, y, v */
double res = sum(y) ;
return res ;
}
So, as Matthew says, this treats the data.table
as a data.frame
(aka a Rcpp::DataFrame
in Rcpp
).
require(data.table)
DT <- data.table(
x=rep(c("a","b","c"),each=3),
y=c(1,3,6),
v=1:9)
do_stuff_with_a_data_table( DT )
# [1] 30
This completely ignores the internals of the data.table
.
Rcpp function to construct a function
Ok, as far as I understand, you want a function returning function with a closure, a.k.a. " the function defined in the closure 'remembers' the environment in which it was created."
In C++11 and up it is quite possible to define such function, along the lines
std::function<double(double)> createax2Rcpp(int a) {
auto ax2 = [a](double x) { return(double(a) * pow(x, 2)); };
return ax2;
}
What happens, the anonymous class and object with overloaded operator()
will be created, it will capture the closure and moved out of the creator function. Return will be captured into instance of std::function with type erasure etc.
But! C/C++ function in R requires to be of a certain type, which is narrower (as an opposite to wider, you could capture narrow objects into wide one, but not vice versa).
Thus, I don't know how to make from std::function a proper R function, looks like it is impossible.
Perhaps, emulation of the closure like below might help
static int __a;
double ax2(double x) {
return(__a * pow(x, 2));
}
Rcpp::Function createax2Rcpp(int a) {
__a = a;
return (ax2);
}
Rcpp/RcppArmadillo C++/R balance for Performance
The RcppArmadillo code is running ~3x slower then native R. A hybrid RcppArmadillo & R variant of the code is running ~1.10 times faster then R.
You must be copying a lot. That is simply not plausible. Maybe time to profile your code?
Take any posted example. Maybe from the Rcpp Gallery which has multiple posts. Maybe from the by now 250 (!!) CRAN packages using RcppArmadillo. Start with something simple, and time it.
You should see the ~ 50x factor for loops converted from R to C++. You should see something close to maybe 1.5 times faster for purely vectorized code (as R tends to do more error checking that code we write in small extensions).
Edit: Here is a trivial benchmark showing how trivial it is to set one up. You really can (and should !!) time all portions you suspect are inefficient. We all make errors, both in design and in execution. The good thing is that you have both tools and plenty of posted examples to guide you.
R> Rfunc <- function(N) { s <- 0; for (i in 1:N) for (j in 1:N) s <- s+1; s }
R> Rfunc(10)
[1] 100
R>
R> library(Rcpp)
R> cppFunction("double Cppfunc(int N) { double s=0; for (int i=0; i<N; i++) for (int j=0; j<N; j++) s++; return(s); }")
R> Cppfunc(10)
[1] 100
R>
R> library(rbenchmark)
R> N <- 1000
R> benchmark(Rfunc(N), Cppfunc(N), order="relative")[,1:4]
test replications elapsed relative
2 Cppfunc(N) 100 0.073 1.000
1 Rfunc(N) 100 12.596 172.548
R>
Rewriting R's cummin() function using Rcpp and allowing for NAs
I would use this:
template<typename T, int RTYPE>
Vector<RTYPE> cummin_cpp2(Vector<RTYPE> x, bool narm){
Vector<RTYPE> res = clone(x);
int i = 1, n = res.size();
T na;
if(narm){
// Ignore NAs
for(; i < n; i++){
if(ISNAN(res[i]) || (res[i-1] < res[i])) res[i] = res[i-1];
}
} else{
// Do not ignore NAs
for(; i < n; i++){
if(ISNAN(res[i-1])) {
na = res[i-1];
break;
} else if(res[i-1] < res[i]){
res[i] = res[i-1];
}
}
for(; i < n; i++){
res[i] = na;
}
}
return res;
}
// [[Rcpp::export]]
SEXP cummin_cpp2(SEXP x, bool narm = false) {
switch (TYPEOF(x)) {
case INTSXP: return cummin_cpp2<int, INTSXP>(x, narm);
case REALSXP: return cummin_cpp2<double, REALSXP>(x, narm);
default: Rcpp::stop("SEXP Type Not Supported.");
}
}
Try this on:
x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)
cummin_cpp(x, narm = TRUE)
x
cummin_cpp(x2)
x2
x <- c(NA, 7, 5, 4, NA, 2, 4)
x2 <- as.integer(x)
x3 <- replace(x, is.na(x), NaN)
cummin_cpp2(x, narm = TRUE)
x
cummin_cpp2(x2)
x2
cummin_cpp2(x3)
x3
Explanation:
- Joran's advice is good, just wrap that in an R function
- I use a dispatcher as Joseph Wood suggested
- Beware that
x
is passed by reference and is modified if of the same type of what you declared (see these 2 slides) - You need to handle
NA
as well asNaN
- You can use
||
instead of|
to evaluate only the first condition if it is true.
Returning bunch of matrices using RCPP in C++ in an efficient way using a list
In my other answer I looked at the efficiency of returning data and at simple optimizations. Here I want to look at something different: Optimization of the algorithm.
You want to compute hhat(i, t)
for 0 <= i, t < n
and i < t
. Looking at your formula we see that the dependency of hhat
on i
and t
is very different. In particular, hhat(i, t + 1)
can be written as hhat(i, t) + something
. Right now your outer loop is over t
and you are recomputing all these intermediate values. By switching the loop order, it is easy to do each such computation only once, bringing the algorithm down to a two nested loops. This means you have to generate the resulting matrices separately. And since you cannot store an arma::mat
inside a Rcpp::List
, I need an additional std::vector
for storage:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
Rcpp::List hello(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
std::vector<arma::mat> foo(n);
for(int t=0; t < n;++t){
arma::mat hhat(p,n);
hhat.zeros();
foo[t] = hhat;
}
for(int i = 0;i < n; ++i){
arma::mat h = exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(i))*dl(i);
for(int t=i+1; t < n;++t){
h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(t))*dl(t);
foo[t].col(i) = h;
}
}
Rcpp::List ht(n);
for(int t=0; t < n;++t){
ht[t] = foo[t];
}
return(ht);
}
// [[Rcpp::export]]
Rcpp::List hello_orig(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
Rcpp::List ht(n);
for(int t=0; t < n;++t){
arma::mat hhat(p,n);
hhat.zeros();
for(int i = 0;i < n; ++i){
arma::mat h(p,1);
h.zeros();
if (t > i){
for(int u=i;u <= t; ++u){
h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(u))*dl(u);
}
}
hhat.col(i) = h;
}
ht[t] = hhat;
}
return(ht);
}
/***R
g=c(1,2.1,3.1)
n=1600
p=3
S = matrix(rnorm(p*n),nrow=p,ncol=n)
dl=runif(n)
z=matrix(runif(p*n),nrow=n,ncol=p)
bench::mark(hello_orig(g=g,n=n,p=p,S = S,zc=z,dl = dl),
hello(g=g,n=n,p=p,S = S,zc=z,dl = dl))
*/
Result:
# A tibble: 2 x 13
expression min median `itr/sec` mem_alloc
<bch:expr> <bch:> <bch:> <dbl> <bch:byt>
1 hello_orig(g = g, n = n, p = p, S = S, zc = z, dl = dl) 14.2s 14.2s 0.0703 58.7MB
2 hello(g = g, n = n, p = p, S = S, zc = z, dl = dl) 53.9ms 85.9ms 11.1 58.7MB
# … with 8 more variables: `gc/sec` <dbl>, n_itr <int>, n_gc <dbl>, total_time <bch:tm>,
# result <list>, memory <list>, time <list>, gc <list>
More than a factor 100 faster!
You can get cleaner (and maybe even a bit faster code) by floowing @coatless' suggestions in the comments to use an arma::cube
. The most compact form will give you a different return structure, though. Instead of a list of p x n
you will get a p x n x n
array:
// [[Rcpp::depends(RcppArmadillo)]]
#include <RcppArmadillo.h>
// [[Rcpp::export]]
arma::cube coatless(
const arma::rowvec& g,
const int& n,
const int& p,
const arma::mat& S,
const arma::mat& zc,
const arma::rowvec& dl){
arma::cube ht(p, n, n);
ht.zeros();
for(int i = 0;i < n; ++i){
arma::mat h = exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(i))*dl(i);
for(int t=i+1; t < n;++t){
h += exp(arma::as_scalar(g*zc.row(i).t())) * (zc.row(i).t() - S.col(t))*dl(t);
ht.slice(t).col(i) = h;
}
}
return(ht);
}
obtim_lbfgs function in Rcppnumerical and RcppEigen with Rcpparmadillo
The rest of the error message that I am getting when compiling your code is of interest:
optim_num.cpp: In function ‘Rcpp::NumericVector aftsrr_bfgs(arma::vec, arma::vec, arma::mat, arma::vec)’:
optim_num.cpp:86:16: error: cannot declare variable ‘obj’ to be of abstract type ‘socreftn_mns’
socreftn_mns obj(TIME, DELTA, COVARI, TARGETVEC);
^~~
optim_num.cpp:11:7: note: because the following virtual functions are pure within ‘socreftn_mns’:
class socreftn_mns: public MFuncGrad
^~~~~~~~~~~~
In file included from /usr/local/lib/R/site-library/RcppNumerical/include/integration/wrapper.h:13:0,
from /usr/local/lib/R/site-library/RcppNumerical/include/RcppNumerical.h:16,
from optim_num.cpp:6:
/usr/local/lib/R/site-library/RcppNumerical/include/integration/../Func.h:52:20: note: virtual double Numer::MFuncGrad::f_grad(Numer::Constvec&, Numer::Refvec)
virtual double f_grad(Constvec& x, Refvec grad) = 0;
^~~~~~
In essence that means that your class socreftn_mns
is derived from the abstract class MFuncGrad
. This class is abstract since it does not contain a definition for the method f_grad(Constvec& x, Refvec grad)
. You try to define this by defining the method f_grad(arma::vec& b_s, arma::vec grad)
, but due to the different function signature, the virtual function is not overloaded. Hence your class is also abstract.
If you use the same signature, things should work out. The required types are defined in terms of Eigen objects:
// Reference to a vector
typedef Eigen::Ref<Eigen::VectorXd> Refvec;
typedef const Eigen::Ref<const Eigen::VectorXd> Constvec;
So you will have to convert back and forth between Aramdillo and Eigen constructs.
Related Topics
What/Where Are the Attributes of a Function Object
Setting Ld_Library_Path from Inside R
How to Find the First and Last Occurrences of an Element in a Data.Frame
How to Replace Numeric Codes with Value Labels from a Lookup Table
Why Is the Class of a Vector the Class of the Elements of the Vector and Not Vector Itself
Add Na Value to Ggplot Legend for Continuous Data Map
Ggplot2: Add P-Values to the Plot
Fill Area Between Two Lines, with High/Low and Dates
Using Mean with .Sd and .Sdcols in Data.Table
Observeevent Shiny Function Used in a Module Does Not Work
Access Data.Table Columns with Strings
How Calculate Growth Rate in Long Format Data Frame
How to Make Discrete Gradient Color Bar with Geom_Contour_Filled
Ggplot2:How to Reduce the Width and the Space Between Bars with Geom_Bar
Detecting Cycle Maxima (Peaks) in Noisy Time Series (In R)
R How to Extract First Row of Each Matrix Within a List
Extracting Coefficients and Their Standard Error for Each Unit in an Lme Model Fit