Job Issues

Job Issues

Job Stuck in Pending (PD)

  • Configuring (CF): ParallelCluster is launching instances — this is normal, wait a few minutes
  • Resources unavailable: Check sinfo to see if nodes are available
  • Insufficient quota: Your account may not have enough instance quota for the requested type

Job Fails Immediately

  1. Check the error log: cat error_<job-id>.log
  2. Verify the software is installed and loaded: spack find, spack load <package>
  3. Check file permissions on your job script: chmod +x my-job.sh
  4. Ensure the partition name in #SBATCH --partition= matches an actual queue

Job Runs But Produces Wrong Results

  • Verify input files are in the correct location (remember /shared is shared, /tmp is not)
  • Check that the correct number of MPI ranks matches your node/core count
  • Review the output log for warnings

Nodes Scale Down Too Quickly

The default idle timeout is 10 minutes. If nodes shut down before your next job starts, you can adjust ScaledownIdletime in your cluster config.