How to Learn How to Write C Code to Speed Up Slow R Functions

Where can I learn how to write C code to speed up slow R functions?

Well there is the good old Use the source, Luke! --- R itself has plenty of (very efficient) C code one can study, and CRAN has hundreds of packages, some from authors you trust. That provides real, tested examples to study and adapt.

But as Josh suspected, I lean more towards C++ and hence Rcpp. It also has plenty of examples.

Edit: There were two books I found helpful:

The first one is Venables and Ripley's "S Programming" even though it is getting long in the tooth (and there have been rumours of a 2nd edition for years). At the time there was simply nothing else.
The second in Chambers' "Software for Data Analysis" which is much more recent and has a much nicer R-centric feel -- and two chapters on extending R. Both C and C++ get mentioned. Plus, John shreds me for what I did with digest so that alone is worth the price of admission.

That said, John is growing fond of Rcpp (and contributing) as he finds the match between R objects and C++ objects (via Rcpp) to be very natural -- and ReferenceClasses help there.

Edit 2: With Hadley's refocussed question, I very strongly urge you to consider C++. There is so much boilerplate nonsense you have to do with C---very tedious and very avoidable. Have a look at the Rcpp-introduction vignette. Another simple example is this blog post where I show that instead of worrying about 10% differences (in one of the Radford Neal examples) we can get eightyfold increases with C++ (on what is of course a contrived example).

Edit 3: There is complexity in that you may run into C++ errors that are, to put it mildly, hard to grok. But to just use Rcpp rather than to extend it, you should hardly ever need it. And while this cost is undeniable, it is far eclipsed by the benefit of simpler code, less boilerplate, no PROTECT/UNPROTECT, no memory management etc pp. Doug Bates just yesterday stated that he finds C++ and Rcpp to be much more like writing R than writing C++. YMMV and all that.

Can I speedup my R code with Rcpp?

The answer by 李哲源 correctly identifies what should be done in your case. As for your original question the answer is two-fold: Yes, you can move the loop to C++ with Rcpp. No, you won't gain performance:

#include <Rcpp.h>

// [[Rcpp::export]]
Rcpp::NumericMatrix fillMatrix(Rcpp::NumericMatrix X,
                   Rcpp::NumericVector y,
                   Rcpp::NumericVector a,
                   Rcpp::Function f) {
  Rcpp::NumericMatrix result = Rcpp::no_init(X.cols(), a.length());
  for (int i = 0; i < a.length(); ++i) {
    result(Rcpp::_, i) = Rcpp::as<Rcpp::NumericVector>(f(X, y, a[i]));
  }
  return result;
}

/*** R
f <- function(X,y,a){
  p = ncol(X)
  res = (crossprod(X) + a*diag(1,p))%*%crossprod(X,y)
  }

X <- matrix(rnorm(500*50),500,50)
y <- rnorm(500)
a <- seq(0,1,0.01)

system.time(fillMatrix(X, y, a, f))
#   user  system elapsed 
#  0.052   0.077   0.075 
system.time({result <- matrix(NA,ncol(X),length(a))

for(i in 1:length(a)){
  result[,i] <- f(X,y,a[i])
  }
})
#   user  system elapsed 
#  0.060   0.037   0.049 
*/

So the Rcpp solution is actually slower than the R solution in this case. Why? Because the real work is done within the function f. This is the same for both solutions, but the Rcpp solution has the additional overhead of calling back to R from C++. Note that for loops in R are not necessarily slow. BTW, here some benchmark data:

          expr      min       lq     mean   median       uq      max neval
 fillMatrixR() 41.22305 41.86880 46.16806 45.20537 49.11250 65.03886   100
 fillMatrixC() 44.57131 44.90617 49.76092 50.99102 52.89444 66.82156   100

increase efficiency and speed of R function

OK, here's the Rcpp solution, and as expected, it beats the R solution by a lot:

rcppFun<-"
CharacterVector fcpp(CharacterVector a,CharacterVector b,int size){
int aSkipped = 0;
int bSkipped = 0;
int asize = a.size();
Rcpp::CharacterVector d(size);
for(int i=0; i<size; i++){
    if(i-aSkipped<asize && a[i-aSkipped][0] == 'D') {
      d[i] = \"D\";
      bSkipped++;
    } else if (b[i-bSkipped][0] == 'D') {
      d[i] = \"I\";
      aSkipped++;
    } else if (a[i-aSkipped][0] == 'I') {
      d[i] = \"I\";
    } else if (b[i-bSkipped][0] == 'I') {
      d[i] = \"D\";
    } else {
      d[i] = a[i-aSkipped];
    }
}
 return d;
}"
require("Rcpp")
fcpp<-cppFunction(rcppFun)

f3<-function(a,b){
  fcpp(a,b,as.integer(length(a)+sum(b=="D")))
}

Warning: that function does no parameter checking at all, so if you feed it bad data you can easily get a seg fault.

If you are going to be calling this a lot, Rcpp is definitely the way to go:

> with(ab(10),microbenchmark(f(a,b),f3(a,b),f2(a,b),my.function.v(a,b)))
Unit: microseconds
                expr     min       lq   median       uq     max neval
             f(a, b) 103.993 107.5155 108.6815 109.7455 178.801   100
            f3(a, b)   7.354   8.1305   8.5575   9.1220  18.014   100
            f2(a, b)  87.081  90.4150  92.2730  94.2585 146.502   100
 my.function.v(a, b)  84.389  86.5140  87.6090  88.8340 109.106   100
> with(ab(100),microbenchmark(f(a,b),f3(a,b),f2(a,b),my.function.v(a,b)))
Unit: microseconds
                expr     min        lq    median        uq      max neval
             f(a, b) 992.082 1018.9850 1032.0180 1071.0690 2784.710   100
            f3(a, b)  12.873   14.3605   14.7370   15.5095   35.582   100
            f2(a, b) 119.396  125.4405  129.3015  134.9915 1909.930   100
 my.function.v(a, b) 769.618  786.7865  802.2920  824.0820  905.737   100
> with(ab(1000),microbenchmark(f(a,b),f3(a,b),f2(a,b),my.function.v(a,b)))
Unit: microseconds
                expr      min        lq     median        uq       max neval
             f(a, b) 9816.295 10065.065 10233.1350 10392.696 12383.373   100
            f3(a, b)   66.057    67.869    83.9075    87.231  1167.086   100
            f2(a, b) 1637.972  1760.258  2667.6985  3138.229 47610.317   100
 my.function.v(a, b) 9692.885 10272.425 10997.2595 11402.602 54315.922   100
> with(ab(10000),microbenchmark(f(a,b),f3(a,b),f2(a,b)))
Unit: microseconds
     expr        min         lq      median          uq        max neval
  f(a, b) 101644.922 103311.678 105185.5955 108342.4960 144620.777   100
 f3(a, b)    607.702    610.039    669.8515    678.1845    785.415   100
 f2(a, b) 221305.641 247952.345 254478.1580 341195.5510 656408.378   100
>

Speed up simple R code (vectorize?)

You could keep a count of ranges that start and end at any specific index and then apply a cumulative sum over the difference of these.

Aggregate the number of ranges that start at each index
Aggregate the number of ranges that end at one position before each index (if ends are inclusive)
Calculate the net change: count of starts - count of ends
Loop over indexes and sum up the net changes cumulatively. This will give the number ranges that started earlier than this index and not ended yet at this index.

The "covered" number is equal to this cumulative sum at each index.

I tried this approach using sparse vectors to cut down on memory usage. Although it may be faster with normal vectors, not sure.
With sparseVector it was 5.7x faster than the loop approach for the given example.

library(Matrix)

set.seed(123)

starts <- sample(10^6,replace = T)
ends <- starts+sample(100:1000,length(starts),replace=T)

v.cov <- NULL
fun1 <- function() {
  coverage <- integer(max(ends))
  for(i in seq(length(starts))) {
    coverage[starts[i]:ends[i]] <- coverage[starts[i]:ends[i]] + 1 
  }
  v.cov <<- coverage
}
# Testing "for loop" approach
system.time(fun1())
# user  system elapsed 
# 21.84    0.00   21.83 

v.sum <- NULL
fun2 <- function() {      
  # 1. Aggregate the number of ranges that start at each index
  t.starts <- table(starts)
  i.starts <- strtoi(names(t.starts))
  x.starts <- as.vector(t.starts)
  sv.starts <- sparseVector(x=x.starts, i=i.starts, length=max(ends)+1)  # to match length of sv.ends below
  # 2. Aggregate the number of ranges that end at one position before each index
  t.ends <- table(ends)
  i.ends <- strtoi(names(t.ends))+1  # because "ends" are inclusive 
  x.ends <- as.vector(t.ends)
  sv.ends <- sparseVector(x=x.ends, i=i.ends, length=max(ends)+1)

  sv.diff <- sv.starts - sv.ends
  v.sum <<- cumsum(sv.diff)[1:max(ends)]  # drop last element
}
# Testing "cumulative sum" approach
system.time(fun2())
# user  system elapsed 
# 3.828   0.000   3.823

identical(v.cov, v.sum)
# TRUE

Also, there is probably a better way to extract x's and i's for sparseVector constructor than using table and strtoi(names(x))that may boost speed further.

EDIT

Avoid strtoi using a 1-column sparseMatrix instead

v.sum.mat <- NULL
fun3 <- function() {
  v.ones <- rep(1, length(starts))
  m.starts <- sparseMatrix(i=starts, j=v.ones, x=v.ones, dims=c(max(ends)+1,1))
  m.ends <- sparseMatrix(i=ends+1, j=v.ones, x=v.ones, dims=c(max(ends)+1,1))
  m.diff <- m.starts - m.ends
  v.sum.mat <<- cumsum(m.diff[,1])[1:max(ends)]
}
# Testing "cumulative sum" approach using matrix
system.time(fun3())
#   user  system elapsed 
#  0.456   0.028   0.486 

identical(v.cov, v.sum.mat)
# TRUE

EDIT 2 - super fast, super short

Based on comment by @alexis_laz, thank you!

fun4 <- function() {
  cumsum(tabulate(starts, max(ends) + 1L) - tabulate(ends + 1L, max(ends) + 1L))[1:max(ends)]
}
system.time(v.sum.tab <- fun4())
# user  system elapsed 
# 0.040   0.000   0.041 

identical(as.integer(v.cov), v.sum.tab)
# TRUE

Rcpp function to be SLOWER than same R function

Others have answered in comments already. So I'll just emphasize the point: Calling back to R functions is expensive as we need to be extra cautious about error handling. Just having the loop in C++ and call R functions is not rewriting your code in C++. Try rewriting psi and pFGM as C++ functions and report back here what happens.

You might argue that you lose some flexibility and you're not able anymore to use any R function. For situations like this, I'd advise to use some sort of hybrid solution where you have implemented the most common cases in C++ and fallback to an R solution otherwise.

As for the other question, a SEXP is an R object. This is part of the R API. It can be anything. When you create a Function from it (as is done implicitly for you when create a function that takes a Function argument), you are guaranteed that this is indeed an R function. The overhead is very small, but the gain in terms of expressiveness of your code is huge.