Doparallel, Cluster VS Cores

doParallel, cluster vs cores

The behavior of doParallel::registerDoParallel(<numeric>) depends on the operating system, see print(doParallel::registerDoParallel) for details.

On Windows machines,

doParallel::registerDoParallel(4)

effectively does

cl <- makeCluster(4)
doParallel::registerDoParallel(cl)

i.e. it set up four ("PSOCK") workers that run in background R sessions. Then, %dopar% will basically utilize the parallel::parLapply() machinery. With this setup, you do have to worry about global variables and packages being attached on each of the workers.

However, on non-Windows machines,

doParallel::registerDoParallel(4)

the result will be that %dopar% will utilize the parallel::mclapply() machinery, which in turn relies on forked processes. Since forking is used, you don't have to worry about globals and packages.

Option cores from package doParallel useless on Windows?

The answer from the maintainer of package doParallel, Rich Calaway:

Windows does not support forking, which is what the parallel (and doParallel) packages use the “cores” argument for. So, on Windows, all “cores” arguments are set to 1. To use multiple cores on Windows with doParallel, use makeCluster to create a multiple worker cluster cl, then registerDoParallel(cl).

So this isn't a bug, but a non-Windows feature, which is a pity.

do I still need to makeCluster if I'm already doing registerDoParallel(cl)

On a Windows machine, these two examples are basically equivalent. The only difference is that the first example uses an explicit cluster object and the second uses an implicit cluster object that is created when you execute registerDoParallel. The performance of the two examples should be the same.

On a Mac or Linux machine, the first example uses the snow derived backend (exactly the same as on a Windows machine), ultimately using clusterApplyLB to perform the parallel computations. The second example uses the multicore derived backend (which was never available on Windows), ultimately using mclapply to perform the parallel computations which will probably be somewhat more efficient than the first example.

Why increasing the number of cores makes a difference?

In practice, it will be nice to set the same number of hardware (physical, 2 in your example) cores as computing threads.

More details:

If your workload is compute intensive, more threads (large than hardware cores) will compete the resource and degrade the performance. However, in some case, such as your example, the workload requires much memory access per computations so that there will be the benefit for more threads to hide memory latency. Actually, the CPU is latency orientation and it can hide latency automatically. In your case, more than 2 threads can gain further improvements but not too much.

Therefore, compared with the tuning time (how much threads you should be used?) on the different system in each time of run, it will be better to use # of hardware cores in your parallel computing program.

A good introduction to parallel computing with R in here.

the difference between doMC and doParallel in R

The doParallel package is a merger of doSNOW and doMC, much as parallel is a merger of snow and multicore. But although doParallel has all the features of doMC, I was told by Rich Calaway of Revolution Analytics that they wanted to keep doMC around because it was more efficient in certain circumstances, even though doMC now uses parallel just like doParallel. I haven't personally run any benchmarks to determine if and when there is a significant difference.

I tend to use doMC on a Linux or Mac OS X computer, doParallel on a Windows computer, and doMPI on a Linux cluster, but doParallel does work on all of those platforms.

As for the different registration methods, if you execute:

registerDoParallel(cores=3)

on a Windows machine, it will create a cluster object implicitly for later use with clusterApplyLB, whereas on Linux and Mac OS X, no cluster object is created or used. The number of cores is simply remembered and used as the value of the mc.cores argument later when calling mclapply.

If you execute:

cl <- makeCluster(3)
registerDoParallel(cl)

then the registered cluster object will be used with clusterApplyLB regardless of the platform. You are correct that in this case, it is your responsibility to shutdown the cluster object since you created it, whereas the implicit cluster object is automatically shutdown.

Doparallel, Cluster VS Cores