Use slurm job id
You can do something like this:
RES=$(sbatch simulation) && sbatch --dependency=afterok:${RES##* } postprocessing
The RES
variable will hold the result of the sbatch
command, something like Submitted batch job 102045
. The construct ${RES##* }
isolates the last word (see more info here), in the current case the job id. The &&
part ensures you do not try to submit the second job in the case the first submission fails.
Use slurm JobID as input?
You should use the environment variable $SLURM_JOBID
in the make-dir.sh
#!/bin/bash
echo $SLURM_JOBID
Save slurm job ID as you submit the job with sbatch
This can be done with --parsable
job_id=$(sbatch --parsable test.sh)
echo $job_id
17211434
How to pass the SLURM-jobID as an input argument to python?
The SLURM_JOBID
environment variable is made available for the job processes only, not for the process that submits the jobs. The job id is returned from the sbatch
command so if you want it in a variable, you need to assign it.
do
SLURM_JOBID=$(sbatch --parsable -N 1 -n 1 --mem=20G --mail-type=END --gres=gpu:V100:3 --wrap="singularity --noslurm tensorflow_19.03-py3.simg python run_CNN_dynlayer.py ${SLURM_JOBID} ${layer} ${dc}")
sleep 1 # pause 1s to be kind to the scheduler...
echo "jobid: "+${SLURM_JOBID}
echo " --- next --- "
done
Note the use of the command substitution $()
jointly with the --parsable
argument of sbatch
.
Note also that the line Submitted batch job 3182711
of the current output will disappear as it is used to populate the SLURM_JOBID
variable.
SLURM dependency on nonexistent job ID
What happens if some or all IDs in $dependencylist are not valid job IDs or have never been submitted?
From a test made with Slurm 20.02.7, it appears that is the job ID is unknown from slurmctld
(either not submitted yet, or job from earlier in the past than what is configured as MinJobAge
), the option is silently ignored. scontrol show job
says then Dependency=(null)
. This does not change afterwards, even if a job with that ID appears.
Alternatively, how can I provide empty arguments to sbatch?
You can take advantage of the above-described behaviour by using "0" as job ID when no dependency is required.
slurm get job id within bash
The job ID is assigned to the SLURM_JOB_ID
environment variable. So in Python, you would get it with
import os
print(os.environ["SLURM_JOB_ID"])
Note that even though the job id is an integer, the environment variable contains a character string so you will need to call the int()
function to convert it.
SLURM requeue with new JOBID
A requeued job is still the same job, so the job ID will not change.
What you can do is prevent requeuing with the --no-requeue
. But then you will need to re-submit the job, either by hand or using a workflow manager.
Another option, is to append the restart count to the folder name. For instance, if your submission script has a line such as
WORKDIR=/some/path/${SLURM_JOB_ID}
mkdir -p $WORKDIR
cd $WORKDIR
you can replace it with
mkdir -p /some/path/${SLURM_JOB_ID}${SLURM_RESTART_COUNT}
mkdir -p $WORKDIR
cd $WORKDIR
Upon first run, the $SLURM_RESTART_COUNT
will be unset, leaving the original behaviour, but then, it will be set to 1, 2, and so on, effectively suffixing the job ID with the requeue number.
For the name of the output file, you can use --open-mode=append
to avoir overwriting the output file when the job restarts.
Related Topics
How to Use Netcat for Windows to Send a Binary File to a Tcp Connection
Why Can't I Sys_Write from a Pointer to Stack Memory, Using Int 0X80
How to Get the List of Dependent Child Images in Docker
Why Doesn't Perf Report Cache Misses
How to Post Raw Body Data with Curl
How Use Qt in Visual Studio Code
Redirect Standard Input Dynamically in a Bash Script
Access Permissions of /Dev/Mem
Searching for a String in Multiple Files on Linux
Why Can Back-Quotes and $() for Command Substitution Result in Different Output
Aws Lambda Permission Denied When Trying to Use Ffmpeg
Print Field 'N' to End of Line
How to Make an "Alias" for a Long Path
Understanding Load Average VS. CPU Usage
How to Run Sudo Command in Winscp to Transfer Files from Windows to Linux
How to Remove Duplicate Words from a Plain Text File Using Linux Command
What Do the .Eh_Frame and .Eh_Frame_Hdr Sections Store, Exactly