doParallel, cluster vs cores
The behavior of doParallel::registerDoParallel(<numeric>)
depends on the operating system, see print(doParallel::registerDoParallel)
for details.
On Windows machines,
doParallel::registerDoParallel(4)
effectively does
cl <- makeCluster(4)
doParallel::registerDoParallel(cl)
i.e. it set up four ("PSOCK") workers that run in background R sessions. Then, %dopar%
will basically utilize the parallel::parLapply()
machinery. With this setup, you do have to worry about global variables and packages being attached on each of the workers.
However, on non-Windows machines,
doParallel::registerDoParallel(4)
the result will be that %dopar%
will utilize the parallel::mclapply()
machinery, which in turn relies on forked processes. Since forking is used, you don't have to worry about globals and packages.
Option cores from package doParallel useless on Windows?
The answer from the maintainer of package doParallel, Rich Calaway:
Windows does not support forking, which is what the parallel (and doParallel) packages use the “cores” argument for. So, on Windows, all “cores” arguments are set to 1. To use multiple cores on Windows with doParallel, use makeCluster to create a multiple worker cluster cl, then registerDoParallel(cl).
So this isn't a bug, but a non-Windows feature, which is a pity.
do I still need to makeCluster if I'm already doing registerDoParallel(cl)
On a Windows machine, these two examples are basically equivalent. The only difference is that the first example uses an explicit cluster object and the second uses an implicit cluster object that is created when you execute registerDoParallel
. The performance of the two examples should be the same.
On a Mac or Linux machine, the first example uses the snow
derived backend (exactly the same as on a Windows machine), ultimately using clusterApplyLB
to perform the parallel computations. The second example uses the multicore
derived backend (which was never available on Windows), ultimately using mclapply
to perform the parallel computations which will probably be somewhat more efficient than the first example.
Why increasing the number of cores makes a difference?
In practice, it will be nice to set the same number of hardware (physical, 2 in your example) cores as computing threads.
More details:
If your workload is compute intensive, more threads (large than hardware cores) will compete the resource and degrade the performance. However, in some case, such as your example, the workload requires much memory access per computations so that there will be the benefit for more threads to hide memory latency. Actually, the CPU is latency orientation and it can hide latency automatically. In your case, more than 2 threads can gain further improvements but not too much.
Therefore, compared with the tuning time (how much threads you should be used?) on the different system in each time of run, it will be better to use # of hardware cores in your parallel computing program.
A good introduction to parallel computing with R in here.
the difference between doMC and doParallel in R
The doParallel
package is a merger of doSNOW
and doMC
, much as parallel
is a merger of snow
and multicore
. But although doParallel
has all the features of doMC
, I was told by Rich Calaway of Revolution Analytics that they wanted to keep doMC
around because it was more efficient in certain circumstances, even though doMC
now uses parallel
just like doParallel
. I haven't personally run any benchmarks to determine if and when there is a significant difference.
I tend to use doMC
on a Linux or Mac OS X computer, doParallel
on a Windows computer, and doMPI
on a Linux cluster, but doParallel
does work on all of those platforms.
As for the different registration methods, if you execute:
registerDoParallel(cores=3)
on a Windows machine, it will create a cluster object implicitly for later use with clusterApplyLB
, whereas on Linux and Mac OS X, no cluster object is created or used. The number of cores is simply remembered and used as the value of the mc.cores
argument later when calling mclapply
.
If you execute:
cl <- makeCluster(3)
registerDoParallel(cl)
then the registered cluster object will be used with clusterApplyLB
regardless of the platform. You are correct that in this case, it is your responsibility to shutdown the cluster object since you created it, whereas the implicit cluster object is automatically shutdown.
Related Topics
R Plotly How to Get 3D Surface with Lat, Long and Z
Directly Adding Titles and Labels to Visnetwork
How to Do a Conditional Sum Which Only Looks Between Certain Date Criteria
Generate All Combinations, of All Lengths, in R, from a Vector
Convert Time Object to Categorical (Morning, Afternoon, Evening, Night) Variable in R
R: Scatter Plot Matrix Using Ggplot2 with Themes That Vary by Facet Panel
Unique.Data.Table Select Last Row in Place of the First
Displaying Image on Point Hover in Plotly
Replace a Subset of a Data Frame with Dplyr Join Operations
Extract Date Elements from Posixlt and Put into Data Frame in R
Extracting Orthogonal Polynomial Coefficients from R's Poly() Function
Intersecting Points and Polygons in R
Getting the Column Names of a Data Frame with Sapply
Boxplot, How to Match Outliers' Color to Fill Aesthetics
R Plots: How to Draw a Border, Shadow or Buffer Around Text Labels
Error in Chol.Default(Cxx):The Leading Minor of Order Is Not Positive Definite