Job Issues
Job Issues
Job Stuck in Pending (PD)
- Configuring (CF): ParallelCluster is launching instances — this is normal, wait a few minutes
- Resources unavailable: Check
sinfoto see if nodes are available - Insufficient quota: Your account may not have enough instance quota for the requested type
Job Fails Immediately
- Check the error log:
cat error_<job-id>.log - Verify the software is installed and loaded:
spack find,spack load <package> - Check file permissions on your job script:
chmod +x my-job.sh - Ensure the partition name in
#SBATCH --partition=matches an actual queue
Job Runs But Produces Wrong Results
- Verify input files are in the correct location (remember
/sharedis shared,/tmpis not) - Check that the correct number of MPI ranks matches your node/core count
- Review the output log for warnings
Nodes Scale Down Too Quickly
The default idle timeout is 10 minutes. If nodes shut down before your next job starts, you can adjust ScaledownIdletime in your cluster config.