Does installing BLAS/ATLAS/MKL/OPENBLAS will speed up R package that is written in C/C++?
It is frequently stated, including in a comment here, that "you have to recompile R" to use different BLAS or LAPACK library. That is wrong.
You do not have to recompile R provided it is build against the shared library versions of BLAS and LAPACK.
I have a package and vignette on CRAN which uses this fact to provide a benchmarking framework in which different BLAS and LAPACK version are timed against each just by installing different ones (one commmand in Debian/Ubuntu) and running benchmarks -- this is so straightforward that it can be automated in a package such as this.
The results in that package will provide an idea of the possible speed differences. Exactly how they pan out depends on your computer, your data (size), your problem etc. But if, say, your problem uses LAPACK functions which can run benefit from running multithreaded then installing OpenBLAS may help. That is true for any R package using LAPACK as they will use the same LAPACK installation accessed through are, and these can be changed.
Large performance differences between OS for matrix computation
tldr: CentOS uses single-threaded OpenBLAS, Linux Mint uses Reference BLAS by default but can use other BLAS versions.
The R packages for CentOS available from EPEL depend on openblas-Rblas
. This seems to be an OpenBLAS build providing BLAS for R. So while it looks like R's BLAS is used, it actually is OpenBLAS. The LAPACK version is always the one provided by R.
On Debian and derived distributions like Mint, r-base-core
depends on
- libblas3 | libblas.so.3
- liblapack3 | liblapack.so.3
By default these are provided by the reference implementations libblas3
and liblapack3
. These are not particularly fast, but you can replace them easily by installing packages like libopenblas-base
. You have control over the BLAS and LAPACK used on your system via update-alternatives
.
For controlling the number of threads with OpenBLAS I normally use RhpcBLASctl
:
N <- 20000
M <- 2000
X <- matrix(rnorm(N * M), N)
RhpcBLASctl::blas_set_num_threads(2)
system.time(crossprod(X))
#> User System verstrichen
#> 2.492 0.331 1.339
RhpcBLASctl::blas_set_num_threads(1)
system.time(crossprod(X))
#> User System verstrichen
#> 2.319 0.052 2.316
For some reason setting the environment variables OPENBLAS_NUM_THREADS
, GOTO_NUM_THREADS
or OMP_NUM_THREADS
from R does not have the desired effect. On CentOS even RhpcBLASctl
does not help, since the used OpenBLAS is single-threaded.
Related Topics
Error in Get(As.Character(Fun), Mode = "Function", Envir = Envir)
To Display Two Heatmaps in Same PDF Side by Side in R
Packages Missing in Shiny-Server
Generating a Color Legend with Shifted Labels Using Ggplot2
Tidyverse Not Loaded, It Says "Namespace 'Vctrs' 0.2.0 Is Already Loaded, But >= 0.2.1 Is Required"
Extract Name of Data.Frame in R as Character
How to Make Single Stacked Bar Chart in Ggplot2
Drawing a Tangent to the Plot and Finding the X-Intercept Using R
R - Scaling Numeric Values Only in a Dataframe with Mixed Types
Update Rows of Data Frame in R
How to Create a Histogram from Aggregated Data in R
R Ggplot2 Boxplots - Ggpubr Stat_Compare_Means Not Working Properly
Compute All Pairwise Differences Within a Vector in R
Return a List in Dplyr Mutate()
How to Convert a Character String Date to Date Class If Day Value Is Missing