Parallel Processing in R Limited

R no longer runs in parallel

I found the solution! Ironically, to get parallel processing back I had to do both of the steps I mentioned in the Q at the same time

So, start R with

taskset 0xffff R

Then run

system(sprintf("taskset -p 0xffffffff %d", Sys.getpid()))

Within R.

Voila, parallel processing returns

R: How to use parallelMap (with mlr, xgboost) on linux server? Unexpected performance compared to windows

This question is more about guessing whats wrong in your setup than actually providing a "real" answer. Maybe you could also change the title as you did not get "unexpected results".

Some points:

  • nthread = 1 is already the default for xgboost in mlr
  • multicore is the preferred mode on UNIX systems
  • If your local machine is faster than your server, than either your calculations finish very quickly and the CPU freq between both is substantially different or you should think about parallelizing another level than mlr.tuneParams (see here for more information)

Edit

Everythings fine on my machine. Looks like a local problem on your side.

library(mlr)
#> Loading required package: ParamHelpers
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
library(parallelMap)

numeric_ps = makeParamSet(
makeNumericParam("C", lower = 0.5, upper = 2.0),
makeNumericParam("sigma", lower = 0.5, upper = 2.0)
)
ctrl = makeTuneControlRandom(maxit=1024L)
rdesc = makeResampleDesc("CV", iters = 3L)

#In serial
start.time.serial <- Sys.time()
res.serial = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
par.set = numeric_ps, control = ctrl)
#> [Tune] Started tuning learner classif.ksvm for parameter set:
#> Type len Def Constr Req Tunable Trafo
#> C numeric - - 0.5 to 2 - TRUE -
#> sigma numeric - - 0.5 to 2 - TRUE -
#> With control class: TuneControlRandom
#> Imputation value: 1
stop.time.serial <- Sys.time()
stop.time.serial - start.time.serial
#> Time difference of 31.28781 secs

#In parallel with 2 CPUs
start.time.parallel.2 <- Sys.time()
parallelStart(mode="multicore", cpu=2, level="mlr.tuneParams")
#> Starting parallelization in mode=multicore with cpus=2.
res.parallel.2 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
par.set = numeric_ps, control = ctrl)
#> [Tune] Started tuning learner classif.ksvm for parameter set:
#> Type len Def Constr Req Tunable Trafo
#> C numeric - - 0.5 to 2 - TRUE -
#> sigma numeric - - 0.5 to 2 - TRUE -
#> With control class: TuneControlRandom
#> Imputation value: 1
#> Mapping in parallel: mode = multicore; level = mlr.tuneParams; cpus = 2; elements = 1024.
#> [Tune] Result: C=1.12; sigma=0.647 : mmce.test.mean=0.0466667
parallelStop()
#> Stopped parallelization. All cleaned up.
stop.time.parallel.2 <- Sys.time()
stop.time.parallel.2 - start.time.parallel.2
#> Time difference of 16.13145 secs

#In parallel with 4 CPUs
start.time.parallel.16 <- Sys.time()
parallelStart(mode="multicore", cpu=4, level="mlr.tuneParams")
#> Starting parallelization in mode=multicore with cpus=4.
res.parallel.16 = tuneParams("classif.ksvm", task = iris.task, resampling = rdesc,
par.set = numeric_ps, control = ctrl)
#> [Tune] Started tuning learner classif.ksvm for parameter set:
#> Type len Def Constr Req Tunable Trafo
#> C numeric - - 0.5 to 2 - TRUE -
#> sigma numeric - - 0.5 to 2 - TRUE -
#> With control class: TuneControlRandom
#> Imputation value: 1
#> Mapping in parallel: mode = multicore; level = mlr.tuneParams; cpus = 4; elements = 1024.
#> [Tune] Result: C=0.564; sigma=0.5 : mmce.test.mean=0.0333333
parallelStop()
#> Stopped parallelization. All cleaned up.
stop.time.parallel.16 <- Sys.time()
stop.time.parallel.16 - start.time.parallel.16
#> Time difference of 10.14408 secs

Created on 2019-06-14 by the reprex package (v0.3.0)

Communication of parallel processes: what are my options?

For communication between processes, a kind of fun place to start is the help page ?socketConnections and the code in the chunk marked "## Not run:". So start an R process and run

 con1 <- socketConnection(port = 6011, server=TRUE)

This process is acting as a server, listening on a particular port for some information. Now start a second R process and enter

 con2 <- socketConnection(Sys.info()["nodename"], port = 6011)

con2 in process 2 has made a socket connection with con1 on process 1. Back at con1, write out the R object LETTERS

writeLines(LETTERS, con1)

and retrieve them on con2.

readLines(con2)

So you've communicated between processes without writing to disk. Some important concepts are also implicit here, e.g., about blocking vs. non-blocking connections, It is not limited to communication on the same machine, provided the ports are accessible across whatever network the computers are on. This is the basis for makePSOCKcluster in the parallel package, with the addition that process 1 actually uses the system command and a script in the parallel package to start process 2. The object returned by makePSOCKcluster is sub-settable, so that you can dedicate a fraction of your cluster to solving a particular task. In principle you could arrange for the spawned nodes to communicate with one another independent of the node that did the spawning.

An interesting exercise is to do the same using the fork-like commands in the parallel package (on non-Windows). A high-level version of this is in the help page ?mcparallel, e.g.,

 p <- mcparallel(1:10)
q <- mcparallel(1:20)
# wait for both jobs to finish and collect all results
res <- mccollect(list(p, q))

but this builds on top of lower-level sendMaster and friends (peak at the mcparallel and mccollect source code).

The Rmpi package takes an approach like the PSOCK example, where the manager uses scripts to spawn workers, and with communication using mpi rather than sockets. But a different approach, worthy of a weekend project if you have a functioning MPI implementation, is to implement a script that does the same calculation on different data, and then collates results onto a single node, using commands like mpi.comm.rank, mpi.barrier, mpi.send.Robj, and mpi.recv.Robj.

A fun weekend project would use the parallel package to implement a work flow that involved parallel computation but not of the mclapply variety, e.g., where one process harvests data from a web site and then passes it to another process that draws pretty pictures. The input to the first process might well be JSON, but the communication within R is probably much more appropriately R data objects.

Parallel processing data analysis - Is there a benefit to having more splits than processor cores?

Broadly speaking, the advantage of splitting it up into more parts is that you can optimize your processor use.

If the dataset is split into 3 parts, one per processor, and they take the following time:

Split A - 10 min

Split B - 20 min

Split C - 12 min

You can see immediately that two of your processors are going to be idle for a significant amount of time needed to do the full analysis.

Instead, if you have 12 splits, each one taking between 3 and 6 minutes to run, then processor A can pick up another chunk of the job after it finishes with the first one instead of idling until the longest-running split finishes.



Related Topics



Leave a reply



Submit