Running on Slurm HPC Systems
mythos with the RayOptimizer can scale across multiple nodes on
Slurm-managed HPC clusters. This page covers practical advice for writing
sbatch scripts and tuning resource allocation.
sbatch Script Example
The following script launches a multi-node Ray cluster on Slurm and runs an
optimization using ray symmetric-run (requires Ray 2.50+):
#!/bin/bash
#SBATCH --job-name=mythos-oxdna-lp
#SBATCH --tasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --nodes=4
#SBATCH --partition=cpu
#SBATCH --time=02:00:00
set -x
nodes=$(scontrol show hostnames "$SLURM_JOB_NODELIST")
nodes_array=($nodes)
head_node=${nodes_array[0]}
port=6379
ip_head=$head_node:$port
export ip_head
echo "IP Head: $ip_head"
module load gcc/9.3.0 cmake/3.27.9 # replace with your cluster's module names
. ~/mythos/.venv/bin/activate # replace with your python virtual/conda environment
srun --nodes="$SLURM_JOB_NUM_NODES" --ntasks="$SLURM_JOB_NUM_NODES" \
ray symmetric-run \
--address "$ip_head" \
--min-nodes "$SLURM_JOB_NUM_NODES" \ # Min nodes waits for all nodes to join before starting
--num-cpus="${SLURM_CPUS_PER_TASK}" \ # Per-worker logical CPU count
-- \
python -u persistence_length_optimization.py
Key sbatch Settings
tasks-per-node
Always set --tasks-per-node=1. Each Slurm node runs one Ray worker
process; Ray handles task-level parallelism internally. Setting this to a
higher value will launch multiple competing Ray workers on the same node,
leading to resource contention and unpredictable behavior.
cpus-per-task
Set --cpus-per-task to the number of CPU cores each node should expose to
Ray. This value is passed to ray symmetric-run via --num-cpus so that
Ray knows how many CPUs are available per node for scheduling tasks.
nodes
The total number of parallel simulators is determined by the total CPU count
across all nodes (cpus-per-task × nodes) divided by the CPUs allocated per
simulator via scheduler_hints. For oxDNA, each simulation typically runs
best with a single CPU (num_cpus=1), so four nodes with 16 cores each can
run up to 64 simulators in parallel.
Note that a single simulator cannot span multiple nodes — if a simulator
requires more CPUs than cpus-per-task, it will not be schedulable. Ensure
that the per-simulator num_cpus in scheduler_hints does not exceed
cpus-per-task.
Ray Cluster Startup
The script above uses ray symmetric-run, which handles starting and
connecting all workers to the head node automatically. The key flags are:
--address "$ip_head"— the head node address, derived from Slurm’sSLURM_JOB_NODELIST.--min-nodes "$SLURM_JOB_NUM_NODES"— wait for all allocated nodes to join before starting the workload.--num-cpus— tells Ray how many CPUs each worker should advertise (should match--cpus-per-task).
Module Loading
If the cluster nodes require specific compiler toolchains or build tools (e.g.,
gcc, cmake). The oxDNA simulator recompiles its binary during
optimization, so cmake and a C++ compiler must be available on every node.
Memory Considerations
oxDNA Build Memory
The oxDNA source compilation (triggered each optimization step when parameters change) can require significant memory — potentially several gigabytes per build. If you observe out-of-memory errors during the build phase, consider:
Requesting higher-memory nodes or partitions (via Slurm’s
--memor--mem-per-cpuoptions, the default may be insufficient).Reducing
n_build_threadson theoxDNASimulatorto lower peak memory during parallel compilation.Setting
mem_mbinscheduler_hintsso Ray avoids scheduling too many concurrent builds on the same node.
from mythos.utils.scheduler import SchedulerHints
simulator = oxdna.oxDNASimulator(
...,
n_build_threads=4,
scheduler_hints=SchedulerHints(
num_cpus=4,
mem_mb=8192, # reserve 8 GB per task
),
)
Trajectory and Gradient Memory
When many simulators feed trajectories into a single objective, memory can grow quickly. See the Memory Considerations section of the Ray Optimizer page for strategies on managing trajectory concatenation and gradient computation memory.
Other considerations
Other simulators, objectives, and callbacks may have their own memory requirements. Typically either the OS or Slurm will kill the job if it exceeds available or allocated memory, and the logs will show OOM kill messages.
Some options for diagnosing:
Use Slurm’s memory monitoring tools (e.g.,
sacctwith theMaxRSSfield)Check Ray’s dashboard for worker resource usage and correlate which tasks are running when OOM kills occur.
Adjust Slurm’s memory allocation and Ray’s scheduler hints iteratively on a scaled-down sample workflow to find a stable configuration
See Observability and Debugging on the Ray Optimizer page for information on using the Ray dashboard to monitor resource usage, including how to access it via SSH port forwarding on Slurm clusters.
Troubleshooting
Further advice on using ray on Slurm can be found in the Ray documentation: Deploying on Slurm.
Workers fail to join
If the job times out waiting for workers, verify that all nodes can reach the head node on port 6379 (or your chosen port). Some clusters have firewall rules between compute nodes that may block Ray’s communication ports, or the port is already bound by another process, and another should be chosen.
Build failures on worker nodes
If oxDNA compilation fails on worker nodes but succeeds locally, ensure that
the required modules (gcc, cmake) are loaded in the sbatch script.
Worker nodes may not have the same default module set as login nodes.
Out-of-memory kills
If Slurm’s OOM killer terminates your job, check whether the issue is during
oxDNA compilation or during gradient computation. Use mem_mb in
scheduler_hints to help Ray distribute memory-intensive tasks across nodes,
and consider requesting a higher-memory partition.