Without root access, run R with tuned BLAS when it is linked with reference BLAS
why my way does not work
First, shared libraries on UNIX are designed to mimic the way archive libraries work (archive libraries were there first). In particular that means that if you have libfoo.so
and libbar.so
, both defining symbol foo
, then whichever library is loaded first is the one that wins: all references to foo
from anywhere within the program (including from libbar.so
) will bind to libfoo.so
s definition of foo
.
This mimics what would happen if you linked your program against libfoo.a
and libbar.a
, where both archive libraries defined the same symbol foo
. More info on archive linking here.
It should be clear from above, that if libblas.so.3
and libopenblas.so.0
define the same set of symbols (which they do), and if libblas.so.3
is loaded into the process first, then routines from libopenblas.so.0
will never be called.
Second, you've correctly decided that since R
directly links against libR.so
, and since libR.so
directly links against libblas.so.3
, it is guaranteed that libopenblas.so.0
will lose the battle.
However, you erroneously decided that Rscript
is better, but it's not: Rscript
is a tiny binary (11K on my system; compare to 2.4MB for libR.so
), and approximately all it does is exec
of R
. This is trivial to see in strace
output:
strace -e trace=execve /usr/bin/Rscript --default-packages=base --vanilla /dev/null
execve("/usr/bin/Rscript", ["/usr/bin/Rscript", "--default-packages=base", "--vanilla", "/dev/null"], [/* 42 vars */]) = 0
execve("/usr/lib/R/bin/R", ["/usr/lib/R/bin/R", "--slave", "--no-restore", "--vanilla", "--file=/dev/null", "--args"], [/* 43 vars */]) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=89625, si_status=0, si_utime=0, si_stime=0} ---
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=89626, si_status=0, si_utime=0, si_stime=0} ---
execve("/usr/lib/R/bin/exec/R", ["/usr/lib/R/bin/exec/R", "--slave", "--no-restore", "--vanilla", "--file=/dev/null", "--args"], [/* 51 vars */]) = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=89630, si_status=0, si_utime=0, si_stime=0} ---
+++ exited with 0 +++
Which means that by the time your script starts executing, libblas.so.3
has been loaded, and libopenblas.so.0
that will be loaded as a dependency of mmperf.so
will not actually be used for anything.
is it possible at all to make it work
Probably. I can think of two possible solutions:
- Pretend that
libopenblas.so.0
is actuallylibblas.so.3
- Rebuild entire
R
package againstlibopenblas.so
.
For #1, you need to ln -s libopenblas.so.0 libblas.so.3
, then make sure that your copy of libblas.so.3
is found before the system one, by setting LD_LIBRARY_PATH
appropriately.
This appears to work for me:
mkdir /tmp/libblas
# pretend that libc.so.6 is really libblas.so.3
cp /lib/x86_64-linux-gnu/libc.so.6 /tmp/libblas/libblas.so.3
LD_LIBRARY_PATH=/tmp/libblas /usr/bin/Rscript /dev/null
Error in dyn.load(file, DLLpath = DLLpath, ...) :
unable to load shared object '/usr/lib/R/library/stats/libs/stats.so':
/usr/lib/liblapack.so.3: undefined symbol: cgemv_
During startup - Warning message:
package ‘stats’ in options("defaultPackages") was not found
Note how I got an error (my "pretend" libblas.so.3
doesn't define symbols expected of it, since it's really a copy of libc.so.6
).
You can also confirm which version of libblas.so.3
is getting loaded this way:
LD_DEBUG=libs LD_LIBRARY_PATH=/tmp/libblas /usr/bin/Rscript /dev/null |& grep 'libblas\.so\.3'
91533: find library=libblas.so.3 [0]; searching
91533: trying file=/usr/lib/R/lib/libblas.so.3
91533: trying file=/usr/lib/x86_64-linux-gnu/libblas.so.3
91533: trying file=/usr/lib/jvm/java-7-openjdk-amd64/jre/lib/amd64/server/libblas.so.3
91533: trying file=/tmp/libblas/libblas.so.3
91533: calling init: /tmp/libblas/libblas.so.3
For #2, you said:
I have no root access on machines I want to test, so actual linking to OpenBLAS is impossible.
but that seems to be a bogus argument: if you can build libopenblas
, surely you can also build your own version of R
.
Update:
You mentioned in the beginning that libblas.so.3 and libopenblas.so.0 define the same symbol, what does this mean? They have different SONAME, is that insufficient to distinguish them by the system?
The symbols and the SONAME
have nothing to do with each other.
You can see symbols in the output from readelf -Ws libblas.so.3
and readelf -Ws libopenblas.so.0
. Symbols related to BLAS
, such as cgemv_
, will appear in both libraries.
Your confusion about SONAME
possibly comes from Windows. The DLL
s on Windows are designed completely differently. In particular, when FOO.DLL
imports symbol bar
from BAR.DLL
, both the name of the symbol (bar
) and the DLL
from which that symbol was imported (BAR.DLL
) are recorded in the FOO.DLL
s import table.
That makes it easy to have R
import cgemv_
from BLAS.DLL
, while MMPERF.DLL
imports the same symbol from OPENBLAS.DLL
.
However, that makes library interpositioning hard, and works completely differently from the way archive libraries work (even on Windows).
Opinions differ on which design is better overall, but neither system is likely to ever change its model.
There are ways for UNIX to emulate Windows-style symbol binding: see RTLD_DEEPBIND
in dlopen man page. Beware: these are fraught with peril, likely to confuse UNIX experts, are not widely used, and likely to have implementation bugs.
Update 2:
you mean I compile R and install it under my home directory?
Yes.
Then when I want to invoke it, I should explicitly give the path to my version of executable program, otherwise the one on the system might be invoked instead? Or, can I put this path at the first position of environment variable $PATH to cheat the system?
Either way works.
R: any faster R function than tcrossprod for symmetric dense matrix multiplication?
No. At R level this is already the fastest. But internally it calls level-3 BLAS routine dsyrk
. So if you can have a high performance BLAS library this will be a lot faster. Try linking OpenBLAS to your R.
Linking a BLAS library does not require rebuilding R. You may have a read on my question linking R to BLAS library for an overview, which contains several links showing you how to set up alias then switch between different BLAS libraries on the machine.
Alternatively, you can read my extremely long question and answer Without root access, run R with tuned BLAS when it is linked with reference BLAS which gives various ways to use an external BLAS library even if R is linked to reference BLAS library.
As a side note, for a matrix with dimension m * n
, dsyrk
has FLOP counts n * m ^ 2
. (Note, this is the computational costs for tcrossprod
. For crossprod
it is m * n ^ 2
.)
You have m = 5000
and n = 200
, and computation takes 2.96s
. Thus, computation has speed: (200 * 5000 ^ 2 / 2.96) * 1e-9 = 1.68 GFLOPs
. Well, this is an ordinary level of performance so at the moment you are definitely using reference BLAS. With OpenBLAS
, performance can reach 10 GFLOPs
or more, depending on your CPU. Good luck!
BDgraph R package producing different (but consistent) results on different OSs
Problem solved from (I believe) version 2.42 of the package.
The issue was with sampling random number inside some OMP parallel region. Linux and MacOSX could make use of OMP while my version under Windows couldn't, hence different results under different OSs (the Windows version was correct for reference).
The author of the package figured out the problem and provided the fix which will be available from the next release at the time of this answer.
How to resolve 'libRblas.so: No such file or directory' during package installation?
Some comments have put me on the right track and helped me to solve the problem—I will briefly summarize.
Since a dependencies issue was suspected, I installed the package from which the error message originated (fracdiff
in this case) and tried again to install the target package. The error reoccurred, but came from a different package indicating cascading problems. Weirdly enough, I definitely knew the package was installed, so I felt my initial suspicion confirmed, that I might have made a mess with the libs
folders when updating R as described in the OP.
Since I could assume that this would happen again and again, the conclusion was to uninstall R completely, and this time the packages as well, and then reinstall everything. Now I could install the target package among others without any problems.
Fortunately, this is quite easy on Linux. Also all packages can be reinstalled relatively unattended. The how-to's are spread out over several threads and sites, I'll put the strings together, adding the references.
Here is what I did in R and in Bash (you will need su/sudo
):
- Store packages (in R) 1
tmp <- installed.packages()
installedpkgs <- as.vector(tmp[is.na(tmp[,"Priority"]), 1])
saveRDS(installedpkgs, 'installed_old.rds')
- Remove R completely 2
dpkg -l | grep ^ii | awk '$2 ~ /^r-/ { print $2 }' | sudo xargs apt-get remove --purge -y
- Remove all R packages 3
The locations might differ from yours.
R -e '.libPaths()'
rm -rf /home/jay/R/x86_64-pc-linux-gnu-library/4.2 /usr/local/lib/R/site-library /usr/lib/R/site-library /usr/lib/R/library
- Install R (here with
apt
) 4
apt install r-base-core
- Restore R packages 5
This runs for a while. Note that only packages that can be found in repositories are installed.
installedpkgs <- readRDS("installed_old.rds")
tmp <- installed.packages()
installedpkgs.new <- as.vector(tmp[is.na(tmp[,"Priority"]), 1])
missing <- setdiff(installedpkgs, installedpkgs.new)
install.packages(missing)
update.packages(ask=FALSE)
Related Topics
How to Override a Non-Visible Function in the Package Namespace
How to Make Gradient Color Filled Timeseries Plot in R
Ggplot - Multiple Legends Arrangement
How to Access the Help/Documentation .Rd Source Files in R
Merge Rows in a Dataframe Where the Rows Are Disjoint and Contain Nas
There Is Pmin and Pmax Each Taking Na.Rm, Why No Psum
Output a Vector in R in the Same Format Used for Inputting It into R
Joining Aggregated Values Back to the Original Data Frame
Do You Use Attach() or Call Variables by Name or Slicing
Why Does As.Factor Return a Character When Used Inside Apply
What Is Integer Overflow in R and How Can It Happen
Specifying Column Names in a Data.Frame Changes Spaces to "."
How to Get Ranks with No Gaps When There Are Ties Among Values
How to Plot a Stacked and Grouped Bar Chart in Ggplot
Adding New Columns to a Data.Table By-Reference Within a Function Not Always Working