Job Issues

Job Issues

Job Stuck in Pending (PD)

Configuring (CF): ParallelCluster is launching instances — this is normal, wait a few minutes
Resources unavailable: Check sinfo to see if nodes are available
Insufficient quota: Your account may not have enough instance quota for the requested type

Job Fails Immediately

Check the error log: cat error_<job-id>.log
Verify the software is installed and loaded: spack find, spack load <package>
Check file permissions on your job script: chmod +x my-job.sh
Ensure the partition name in #SBATCH --partition= matches an actual queue

Job Runs But Produces Wrong Results

Verify input files are in the correct location (remember /shared is shared, /tmp is not)
Check that the correct number of MPI ranks matches your node/core count
Review the output log for warnings

Nodes Scale Down Too Quickly

The default idle timeout is 10 minutes. If nodes shut down before your next job starts, you can adjust ScaledownIdletime in your cluster config.