4.6 Working with Data & Software
Working with Data & Software
Once connected to your head node, you’ll want to install software and get your data in place. ParallelCluster provides shared filesystems that make this straightforward.
Shared Filesystems
All nodes in your cluster share several filesystems:
| Mount Point | Type | Description |
|---|---|---|
/shared |
FSx for Lustre | High-performance shared storage for data and software |
/home |
NFS (EBS) | Home directories, shared across all nodes |
/opt/slurm |
NFS | Slurm installation, shared across all nodes |
You can verify shared mounts with:
Tip
Install software and store data in /shared so it’s accessible from both the head node and all compute nodes.
Installing Software with Spack
Spack is a package manager for supercomputers that makes installing scientific software easy. It supports Python, R, C, C++, and Fortran packages, and can target specific compilers and architectures.
Setting Up Spack
Enable the Binary Cache
The Spack Binary Cache provides pre-built packages, reducing install times dramatically:
Installing Packages
Moving Data In and Out
SCP / SFTP
Transfer files to the head node using standard tools:
AWS CLI (S3)
Transfer data to/from S3 buckets:
Organizing Your Data
A recommended directory structure on /shared:
Storage Considerations
| Storage | Performance | Persistence | Cost |
|---|---|---|---|
| FSx for Lustre | Very high throughput | Deleted with cluster (SCRATCH) | Based on capacity |
| EBS (head node) | Standard SSD | Deleted with cluster | Based on size |
| S3 | High throughput for bulk | Persistent (independent of cluster) | Per GB/month |
Warning
FSx for Lustre SCRATCH storage is deleted when the cluster is deleted. Always back up important results to S3 before deleting your cluster.
Now that your software and data are ready, let’s submit some jobs: Submitting Jobs with Slurm