How to Let R Use All the Cores of the Computer

How to let R use all the cores of the computer?

Yes, for starters, see the High Performance Computing Task View on CRAN. This lists details of packages that can be used in support of parallel computing on a single machine.

From R version 2.14.0, there is inbuilt support for parallel computing via the parallel package, which includes slightly modified versions of the existing snow and multicore packages. The parallel package has a vignette that you should read. You can view it using:

vignette(package="parallel", topic = "parallel")

There are other ways to exploit your multiple cores, for example via use of a multi-threaded BLAS for linear algebra computations.

Whether any of this will speed up the "statistics calculations" you want to do will depend on what those "statistics calculations" are. Spawning off multiple threads or workers entails an overhead cost to set them up, manage them and collect the results. Some operations see a benefit (some large, some small) of using multiple cores/threads, others are slowed down because of this extra overhead.

In short, do not expect to get an n times decrease in your compute time by using n cores instead of just 1.

R: how to check how many cores/CPU usage available

On Linux you can send ps command to the system: it gives you the average cpu usage and the memory usage of the program called rsession:

splitted <- strsplit(system("ps -C rsession -o %cpu,%mem,pid,cmd", intern = TRUE), " ")
df <- do.call(rbind, lapply(splitted[-1],
function(x) data.frame(
cpu = as.numeric(x[2]),
mem = as.numeric(x[4]),
pid = as.numeric(x[5]),
cmd = paste(x[-c(1:5)], collapse = " "))))
df
# cpu mem pid cmd
#1 0.8 0.7 11001 /usr/lib/rstudio/bin/rsession
#2 0.0 0.2 12397 /usr/lib/rstudio/bin/rsession
#3 0.1 0.7 14960 /usr/lib/rstudio/bin/rsession
#4 0.4 0.2 26122 /usr/lib/rstudio-server/bin/rsession
#5 0.3 8.3 35782 /usr/lib/rstudio/bin/rsession

You can probably improve it to get the parent id and the instantaneous CPU usage with other options passed to ps or top and deduce the number of cores used by each session.

On Windows you can try this:

a <- system("wmic path Win32_PerfFormattedData_PerfProc_Process get Name,PercentProcessorTime", intern = TRUE)
df <- do.call(rbind, lapply(strsplit(a, " "), function(x) {x <- x[x != ""];data.frame(process = x[1], cpu = x[2])}))
df[grepl("Rgui|rstudio", df$process),]
# process cpu
# 105 Rgui 0
# 108 rstudio 0

Unlimiting the CPU usage from R

R is single-threaded by default, and runs only on a single thread on the CPU, which is a pity if you have a machine with 16 or 32 cores. By unlimiting the CPU usage, I have to assume you're asking if there's any way to have an R process (let's say part of the k-means algorithm) take advantage of your full CPU power by running the process in-parallel.

Many R packages and processes are not going to be helped by parallel processing though. So the technical solution to your particular problem goes down to the package implementation you're using. Popular packages like caret do support parallelization when that's possible, even though you may need to add an additional allowParallel=T parameter. They work in conjunction with a library such as doMC to allow multi-core processes. In the following sample code, I have my machine use 8 cores through the registerDoMC(8) function, and then set allowParallel=T.

library(doMC)
registerDoMC(8)
system.time({
ctrl_2 <- trainControl(method="cv", number=3, allowParallel=T)
fb_forest_2 <- train(classe ~ ., data=fb_train, method="rf", trControl = ctrl_2)
})

Again, parallel processing doesn't always help - Not all process can be parallelized! The documentation for foreach are a great read so if you can afford the time take a look at it. The specific code solution for your problem also depend on the library implementation you're using.



Related Topics



Leave a reply



Submit