Use Slurm Job Id

Use slurm job id

You can do something like this:

RES=$(sbatch simulation) && sbatch --dependency=afterok:${RES##* } postprocessing

The RES variable will hold the result of the sbatch command, something like Submitted batch job 102045. The construct ${RES##* } isolates the last word (see more info here), in the current case the job id. The && part ensures you do not try to submit the second job in the case the first submission fails.

Use slurm JobID as input?

You should use the environment variable $SLURM_JOBID in the make-dir.sh

#!/bin/bash

echo $SLURM_JOBID

Save slurm job ID as you submit the job with sbatch

This can be done with --parsable

job_id=$(sbatch --parsable test.sh)
echo $job_id
17211434

How to pass the SLURM-jobID as an input argument to python?

The SLURM_JOBID environment variable is made available for the job processes only, not for the process that submits the jobs. The job id is returned from the sbatch command so if you want it in a variable, you need to assign it.

  do
SLURM_JOBID=$(sbatch --parsable -N 1 -n 1 --mem=20G --mail-type=END --gres=gpu:V100:3 --wrap="singularity --noslurm tensorflow_19.03-py3.simg python run_CNN_dynlayer.py ${SLURM_JOBID} ${layer} ${dc}")
sleep 1 # pause 1s to be kind to the scheduler...
echo "jobid: "+${SLURM_JOBID}
echo " --- next --- "
done

Note the use of the command substitution $() jointly with the --parsable argument of sbatch.

Note also that the line Submitted batch job 3182711 of the current output will disappear as it is used to populate the SLURM_JOBID variable.

SLURM dependency on nonexistent job ID

What happens if some or all IDs in $dependencylist are not valid job IDs or have never been submitted?

From a test made with Slurm 20.02.7, it appears that is the job ID is unknown from slurmctld (either not submitted yet, or job from earlier in the past than what is configured as MinJobAge), the option is silently ignored. scontrol show job says then Dependency=(null). This does not change afterwards, even if a job with that ID appears.

Alternatively, how can I provide empty arguments to sbatch?

You can take advantage of the above-described behaviour by using "0" as job ID when no dependency is required.

slurm get job id within bash

The job ID is assigned to the SLURM_JOB_ID environment variable. So in Python, you would get it with

import os
print(os.environ["SLURM_JOB_ID"])

Note that even though the job id is an integer, the environment variable contains a character string so you will need to call the int() function to convert it.

SLURM requeue with new JOBID

A requeued job is still the same job, so the job ID will not change.

What you can do is prevent requeuing with the --no-requeue. But then you will need to re-submit the job, either by hand or using a workflow manager.

Another option, is to append the restart count to the folder name. For instance, if your submission script has a line such as

WORKDIR=/some/path/${SLURM_JOB_ID}
mkdir -p $WORKDIR
cd $WORKDIR

you can replace it with

mkdir -p /some/path/${SLURM_JOB_ID}${SLURM_RESTART_COUNT}
mkdir -p $WORKDIR
cd $WORKDIR

Upon first run, the $SLURM_RESTART_COUNT will be unset, leaving the original behaviour, but then, it will be set to 1, 2, and so on, effectively suffixing the job ID with the requeue number.

For the name of the output file, you can use --open-mode=append to avoir overwriting the output file when the job restarts.



Related Topics



Leave a reply



Submit