4.4 Creating Your First Cluster
Creating Your First Cluster
This walkthrough takes you from a config file to a running HPC cluster. The provisioning process takes about 10-15 minutes.
Cluster Architecture Overview
A typical ParallelCluster setup consists of:
| Component | Description |
|---|---|
| Head Node | Controls compute nodes, hosts the Slurm scheduler, and provides your login environment |
| Compute Nodes | Dynamically provisioned when jobs are submitted; scale to zero when idle |
| Shared Storage | FSx for Lustre or EBS volumes shared across all nodes (mounted at /shared) |
| Placement Groups | Keep instances physically close for maximum network performance |
| Scheduler | Slurm manages job queues and resource allocation |
Step 1: Prepare Your Configuration
ParallelCluster uses a YAML config file to describe your cluster. You can either:
- Use the PCUI wizard to build one interactively
- Write one manually (or use a template)
A config file defines your head node size, compute queues, storage, networking, and more. The great thing is you can share these files with colleagues so they can launch identical clusters.
Example Configuration
See the Reference section for full config file documentation and templates.
Step 2: Create the Cluster
Via PCUI (Recommended)
- Open the ParallelCluster UI
- Click Create Cluster
- Follow the step-by-step wizard to configure your cluster
- Use Dry Run to validate your configuration before deploying
- Click Create to launch
Via CLI
Step 3: Wait for Provisioning
The cluster will take 10-15 minutes to provision. You can monitor progress:
- PCUI: Watch the cluster status change from “Creating” to “Running”
- CLI: Run
pcluster list-clustersto check status
Tip
Only one cluster of a given name can exist at any time per AWS Region per account.
Key Configuration Concepts
| Concept | Description |
|---|---|
| Queues | Define groups of compute nodes with specific instance types |
| MinCount / MaxCount | Control auto-scaling bounds (set MinCount to 0 for cost savings) |
| EFA | Elastic Fabric Adapter for high-speed inter-node networking (100 Gbps) |
| Placement Groups | Keep nodes physically close for lowest latency |
| Shared Storage | FSx for Lustre provides high-performance shared filesystem |
Warning
Cost Reminder: Compute nodes scale to zero when idle, but the head node and storage run continuously. Delete your cluster when you’re done to stop all charges.
Your cluster is running — time to connect: Connecting to Your Cluster