How to let R use all the cores of the computer?
Yes, for starters, see the High Performance Computing Task View on CRAN. This lists details of packages that can be used in support of parallel computing on a single machine.
From R version 2.14.0, there is inbuilt support for parallel computing via the parallel package, which includes slightly modified versions of the existing snow and multicore packages. The parallel package has a vignette that you should read. You can view it using:
vignette(package="parallel", topic = "parallel")
There are other ways to exploit your multiple cores, for example via use of a multi-threaded BLAS for linear algebra computations.
Whether any of this will speed up the "statistics calculations" you want to do will depend on what those "statistics calculations" are. Spawning off multiple threads or workers entails an overhead cost to set them up, manage them and collect the results. Some operations see a benefit (some large, some small) of using multiple cores/threads, others are slowed down because of this extra overhead.
In short, do not expect to get an n times decrease in your compute time by using n cores instead of just 1.
R: how to check how many cores/CPU usage available
On Linux you can send ps command to the system: it gives you the average cpu usage and the memory usage of the program called rsession:
splitted <- strsplit(system("ps -C rsession -o %cpu,%mem,pid,cmd", intern = TRUE), " ")
df <- do.call(rbind, lapply(splitted[-1],
function(x) data.frame(
cpu = as.numeric(x[2]),
mem = as.numeric(x[4]),
pid = as.numeric(x[5]),
cmd = paste(x[-c(1:5)], collapse = " "))))
df
# cpu mem pid cmd
#1 0.8 0.7 11001 /usr/lib/rstudio/bin/rsession
#2 0.0 0.2 12397 /usr/lib/rstudio/bin/rsession
#3 0.1 0.7 14960 /usr/lib/rstudio/bin/rsession
#4 0.4 0.2 26122 /usr/lib/rstudio-server/bin/rsession
#5 0.3 8.3 35782 /usr/lib/rstudio/bin/rsession
You can probably improve it to get the parent id and the instantaneous CPU usage with other options passed to ps or top and deduce the number of cores used by each session.
On Windows you can try this:
a <- system("wmic path Win32_PerfFormattedData_PerfProc_Process get Name,PercentProcessorTime", intern = TRUE)
df <- do.call(rbind, lapply(strsplit(a, " "), function(x) {x <- x[x != ""];data.frame(process = x[1], cpu = x[2])}))
df[grepl("Rgui|rstudio", df$process),]
# process cpu
# 105 Rgui 0
# 108 rstudio 0
Unlimiting the CPU usage from R
R is single-threaded by default, and runs only on a single thread on the CPU, which is a pity if you have a machine with 16 or 32 cores. By unlimiting the CPU usage, I have to assume you're asking if there's any way to have an R process (let's say part of the k-means algorithm) take advantage of your full CPU power by running the process in-parallel.
Many R packages and processes are not going to be helped by parallel processing though. So the technical solution to your particular problem goes down to the package implementation you're using. Popular packages like caret
do support parallelization when that's possible, even though you may need to add an additional allowParallel=T
parameter. They work in conjunction with a library such as doMC
to allow multi-core processes. In the following sample code, I have my machine use 8 cores through the registerDoMC(8)
function, and then set allowParallel=T
.
library(doMC)
registerDoMC(8)
system.time({
ctrl_2 <- trainControl(method="cv", number=3, allowParallel=T)
fb_forest_2 <- train(classe ~ ., data=fb_train, method="rf", trControl = ctrl_2)
})
Again, parallel processing doesn't always help - Not all process can be parallelized! The documentation for foreach
are a great read so if you can afford the time take a look at it. The specific code solution for your problem also depend on the library implementation you're using.
Related Topics
How Does Settimelimit Work in R
How to Calculate the Median on Grouped Dataset
Using Facet Tags and Strip Labels Together in Ggplot2
How to Set Axis Ranges in Ggplot2 When Using a Log Scale
Alpha Aesthetic Shows Arrow's Skeleton Instead of Plain Shape - How to Prevent It
Sort Boxplot by Mean (And Not Median) in R
How to Log an R Session to a File
What Is the "Embracing Operator" '{{ }}'
Disconnected from Server in Shinyapps, But Local's Working
Expanding Factor Interactions Within a Formula
Obtaining Percent Scales Reflective of Individual Facets with Ggplot2
Major and Minor Tickmarks with Plotly
Divide All Columns by a Chosen Column Using Mutate_All
Assign Names to Vector Entries Without Assigning the Vector a Variable Name