R Programming - Submitting Jobs on a Multiple Node Linux Cluster Using Pbs

Submitting R jobs using PBS

Why not do the following -

ask PBS for ppn=4, additionally, ask for all the memory on the node, i e

#PBS -l nodes=1:ppn=4 -l mem=31944900k 

This might not be possible on your setup.

I am not sure how R is parallelized, but if it is OPENMP you could definitely ask for 8 cores but set OMP_NUM_THREADS to 4

Does a PBS batch system move multiple serial jobs across nodes?

No, PBS won't automatically distribute the jobs among nodes for you. But this is a common thing to want to do, and you have a few options.

  • Easiest and in some ways most advantagous for you is to bunch the tasks into 1-node sized chunks, and submit those bundles as individual jobs. This will get your jobs started faster; a 1-node job will normally get scheduled faster than a (say) 14 node job, just because there's more one-node sized holes in the schedule than 14. This works particularly well if all the jobs take roughly the same amount of time, because then doing the division is pretty simple.

  • If you do want to do it all in one job (say, to simplify the bookkeeping), you may or may not have access to the pbsdsh command; there's a good discussion of it here. This lets you run a single script on all the processors in your job. You then write a script which queries $PBS_VNODENUM to find out which of the nnodes*ppn jobs it is, and runs the appropriate task.

  • If not pbsdsh, Gnu parallel is another tool which can enormously simplify these tasks. It's like xargs, if you're familiar with that, but will run commands in parallel, including on multiple nodes. So you'd submit your (say) 14-node job and have the first node run a gnu parallel script. The nice thing is that this will do scheduling for you even if the jobs are not all of the same length. The advice we give to users on our system for using gnu parallel for these sorts of things is here. Note that if gnu parallel isn't installed on your system, and for some reason your sysadmins won't do it, you can set it up in your home directory, it's not a complicated build.

GNU parallel --jobs option using multiple nodes on cluster with multiple cpus per node

  1. Yes: -j is the number of jobs per node.
  2. Yes: Install 'parallel' in your $PATH on the remote hosts.
  3. Yes: It is a consequence from parallel missing from the $PATH.

GNU Parallel logs into the remote machine; tries to determine the number of cores (using parallel --number-of-cores) which fails and then defaults to 1 CPU core per host. By giving -j2 GNU Parallel will not try to determine the number of cores.

Did you know that you can also give the number of cores in the --sshlogin as: 4/myserver ? This is useful if you have a mix of machines with different number of cores.

Force load R packages while running the job in cluster

Setting timer to reload each package until each package in the list is successfully loaded. There is a timer of 5 second to force load the package when running the qsub option.

 myPackages <- c("biomaRt", "dplyr", "stringi","GenomicFeatures","Rsamtools","foreach","doMC")
tryCount <- 0

while( !all(myPackages %in% (.packages())) ){

try(require(biomaRt))
try(require(dplyr))
try(require(stringi))
try(require(GenomicFeatures))
try(require(Rsamtools))
try(require(foreach))
try(require(doMC))

tryCount <- tryCount + 1

if( !all(myPackages %in% (.packages())) ){
cat(paste0("Failure: ", tryCount, "\n"))
cat("Failed to load: ")
cat(myPackages[ !myPackages %in% (.packages()) ])
cat("\n")
} else {
print(paste0("Success!"))
}

Sys.sleep(5)

}


Related Topics



Leave a reply



Submit