Slurm Quick Reference

Slurm Quick Reference

Essential Commands

Command Description
sinfo Show partitions and node status
squeue Show job queue
sbatch script.sh Submit a batch job
scancel <job-id> Cancel a job
scancel -u $USER Cancel all your jobs
scontrol show job <id> Detailed job info
sacct -j <id> Job accounting info (after completion)
watch squeue Monitor queue in real-time

Common Job Script Directives

Directive Description Example
--job-name Name your job #SBATCH --job-name=cfd-sim
--partition Target queue #SBATCH --partition=hpc6a
--nodes Number of nodes #SBATCH --nodes=2
--ntasks-per-node Tasks per node #SBATCH --ntasks-per-node=96
--time Wall time limit #SBATCH --time=04:00:00
--output Stdout file #SBATCH --output=out_%j.log
--error Stderr file #SBATCH --error=err_%j.log
--exclusive Exclusive node access #SBATCH --exclusive

Job Script Template

#!/bin/bash
#SBATCH --job-name=my-job
#SBATCH --partition=hpc6a
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=96
#SBATCH --time=02:00:00
#SBATCH --output=output_%j.log
#SBATCH --error=error_%j.log

# Load software
spack load <package>

# Run
mpirun -np $SLURM_NTASKS ./my_application

Node States

State Meaning
idle~ Not running; will launch on job submit
idle% Running; will shut down after idle timeout
mix Partially allocated
alloc Fully allocated
drain Marked for maintenance

Job States

Code State Meaning
PD Pending Waiting for resources
CF Configuring Instances launching
R Running Executing
CG Completing Finishing
CD Completed Done successfully
F Failed Exited with error
CA Cancelled Cancelled by user

For the full Slurm documentation, see the Slurm Quick Start Guide.