4.7 Submitting Jobs with Slurm
Submitting Jobs with Slurm
Slurm is the job scheduler used by ParallelCluster. It manages job queues, allocates compute nodes, and handles scaling automatically.
Key Slurm Commands
| Command | Description |
|---|---|
sinfo |
List partitions (queues) and node status |
squeue |
List queued and running jobs |
sbatch script.sh |
Submit a job script |
scancel <job-id> |
Cancel a job |
scontrol show job <job-id> |
Show detailed job info |
Understanding Node States
When you run sinfo, nodes will show different states:
| State | Description |
|---|---|
idle~ |
No instance running; will launch when a job is submitted |
idle% |
Instance running; will shut down after idle timeout (default 10 min) |
mix |
Instance partially allocated |
alloc |
Instance fully allocated |
Job Status Codes
When monitoring with squeue, the ST column shows:
| Code | Status | Description |
|---|---|---|
PD |
Pending | Waiting for resource allocation |
CF |
Configuring | ParallelCluster is provisioning instances |
R |
Running | Job script is executing |
CG |
Completing | Job is finishing up |
Writing a Job Script
A basic Slurm job script:
Submit it with:
Monitoring Jobs
How Auto-Scaling Works
ParallelCluster automatically manages compute nodes based on your job queue:
- You submit a job with
sbatch - Slurm sees the resource request and signals ParallelCluster
- ParallelCluster launches the required compute instances
- Instances join the cluster and the job starts running
- When the job completes, instances stay idle for the cooldown period (default 10 minutes)
- After the cooldown, idle instances are terminated
This means you only pay for compute when jobs are actually running.
Multiple Queues
Your cluster can have multiple queues with different instance types:
Cleaning Up
When you’re done with your cluster, delete it to stop all charges:
- PCUI: Select your cluster → Click Delete
- CLI:
pcluster delete-cluster --cluster-name my-hpc-cluster
Warning
Deleting a cluster removes all resources including storage. Back up your results to S3 before deleting.
Having issues? Check the Troubleshooting & FAQs.