How to Find from Where a Job Is Submitted in Slurm

How to find from where a job is submitted in SLURM?

You can use the scontrol command to see the job details. $ scontrol show job <jobid>

For example, for a running job on our SLURM cluster:

$ scontrol show job 1665191
JobId=1665191 Name=tasktest
...
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/lustre/work/.../slurm_test/task.submit
WorkDir=/lustre/work/.../slurm_test

You are looking for the last line, WorkDir.

How to get original location of script used for SLURM job?

You can get the initial (i.e. at submit time) location of the submission script from scontrol like this:

scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}'

So you can replace the realpath $0 part with the above. This will only work within a Slurm allocation of course. So if you want the script to work in any situation, you will need some logic like:

if [ -n $SLURM_JOB_ID ] ; then
THEPATH=$(scontrol show job $SLURM_JOBID | awk -F= '/Command=/{print $2}')
else
THEPATH=$(realpath $0)
fi

and then proceed with

SHARED_PATH=$(dirname $(dirname "${THEPATH}"))

How can I find out the command (batch script filename) of a finished SLURM job?

Slurm does not indeed store the command in the accounting database. Two workarounds:

For a single user: use the JobName or Comment to store the script name upon submission. These are stored in the database, but this approach is error-prone;

Cluster-wise: enable job completion plugin to ElastiSearch as this stores not only the script name but the whole contents as well.

Slurm job, knowing what node it is on

A simple, yet effective, and often used, way to write in the job output on which node it ran is to add

srun hostname

to it. Also the job id is available from within the job script through environment variable SLURM_JOB_ID ; so you can use

sstat -j $SLURM_JOB_ID

in your slurm script to get the information you want.

Do submitted jobs take a copy the source? Queued jobs?

The sbatch command creates a copy of the submission script and a snapshot of the environment and saves it in the directory listed as the StateSaveLocation configuration parameter. It can therefore be changed after submission without effect.
But that is not the case for the files used in the submission script. If your submission script starts an executable, if will see the "version" of the executable at the time it starts.
Modifying the program before it starts will lead to the new version being run, modifying it during the run (i.e. while it has already been read from disk and saved into memory) will lead to the old version being run.

How can I get detailed job run info from SLURM (e.g. like that produced for standard output by LSF)?

At the end of each job I use to insert

sstat -j $SLURM_JOB_ID.batch --format=JobID,MaxVMSize

to add RAM usage to the standard output.



Related Topics



Leave a reply



Submit