GPU Acceleration ================ ``mythos`` can leverage GPUs at multiple levels: the simulation backend (oxDNA CUDA), the JAX runtime (for energy evaluation and gradient computation), and the Ray scheduler (for distributing GPU resources across workers). This page covers configuration for each. .. contents:: On this page :local: :depth: 2 oxDNA CUDA Backend ------------------ The oxDNA simulator supports a CUDA backend that runs the simulation on an NVIDIA GPU. This can dramatically accelerate individual simulations, especially for large systems. Enabling the CUDA backend ^^^^^^^^^^^^^^^^^^^^^^^^^ Set the ``backend`` to ``CUDA`` in your oxDNA input file(s): .. code-block:: text backend = CUDA CUDA_device = When ``mythos`` detects ``backend = CUDA`` in the input configuration, it automatically passes ``-DCUDA=ON`` to CMake during the oxDNA build step. Note that the ``CUDA_device`` is a relative index, so when running with the ray optimizer backend, in general it can be set to ``0``, as we will assign one GPU to the task via Ray's scheduler hints (see below). The option can be passed in the input file directly as above or overridden via the ``oxdna.oxDNASimulator`` constructor using `input_overrides`: .. code-block:: python simulator = oxdna.oxDNASimulator( ..., input_overrides={ "backend": "CUDA", "CUDA_device": 0, # optional: specify which GPU to use }, ) Finally, note also that there may be other input parameters that are required to use the CUDA backend, depending on the simulation type. Some types do not support CUDA at all. See the `oxDNA documentation `_ Build requirements ^^^^^^^^^^^^^^^^^^ The CUDA backend requires: - An NVIDIA GPU with a supported compute capability - The CUDA toolkit installed and available on the build node (``nvcc`` on ``PATH``) - A compatible C++ compiler (e.g., ``gcc``) If you are building on an HPC cluster, you will typically need to load CUDA and compiler modules before running your optimization: .. code-block:: bash module load gcc/9.3.0 cuda/11.8 cmake/3.27.9 .. note:: When using the CUDA backend with the ``RayOptimizer``, ensure that CUDA is available on **every worker node** that may run an oxDNA simulation task. See `oxDNA documentation `_ for full build instructions and supported GPU architectures. JAX GPU Usage ------------- JAX can use GPUs for energy function evaluation and gradient computation (DiffTRe reweighting). By default, JAX will use a GPU if one is available. Installing JAX with GPU support ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The default ``jax`` package is CPU-only. To enable GPU support, install the CUDA-enabled variant: .. code-block:: bash pip install "jax[cuda12]" See the `JAX installation guide `_ for other CUDA versions, ROCm support, and troubleshooting. Controlling the JAX platform ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If a GPU-enabled JAX installation is present and a GPU is allocated via ``SchedulerHints``, JAX will automatically use the GPU — no extra configuration is needed. To force JAX onto CPU instead (useful when GPU memory is limited and you want to reserve it entirely for the simulation backend), set ``JAX_PLATFORM_NAME`` in the Ray worker environment. This is necessary because ``jax.config`` calls in the driver process have no effect inside workers: .. code-block:: python ray.init( runtime_env={ "env_vars": { "JAX_ENABLE_X64": "True", "JAX_PLATFORM_NAME": "cpu", }, }, ) .. tip:: A common pattern is to run the **oxDNA simulation** on the GPU (CUDA backend) while running the **JAX gradient computation** on the CPU. This avoids competition for GPU memory between the simulation binary and JAX's autodiff graph. Other Simulators ----------------- GROMACS ^^^^^^^ GROMACS supports GPU acceleration when built with CUDA or OpenCL. Since ``mythos`` invokes the ``gmx`` binary directly, GPU usage depends on how GROMACS was built and configured on your system. Consult the `GROMACS installation guide `_ for building with GPU support. Once installed, GPU offloading is typically controlled via ``gmx mdrun`` flags (e.g., ``-nb gpu``, ``-pme gpu``). LAMMPS ^^^^^^ LAMMPS supports GPU acceleration through several packages (``GPU``, ``KOKKOS``, ``INTEL``). As with GROMACS, ``mythos`` calls the ``lmp`` binary directly, so GPU support depends on your LAMMPS build. See the `LAMMPS GPU documentation `_ for build instructions and runtime configuration. For both GROMACS and LAMMPS, use ``num_gpus`` in ``SchedulerHints`` to ensure Ray allocates GPU resources appropriately for these tasks. GPU Allocation with Ray Scheduler Hints ---------------------------------------- The ``RayOptimizer`` uses ``SchedulerHints`` to tell Ray how many GPUs each task requires. Ray uses this information to partition available GPUs across workers and set ``CUDA_VISIBLE_DEVICES`` accordingly. Setting ``num_gpus`` ^^^^^^^^^^^^^^^^^^^^ Specify GPU requirements per simulator or objective: .. code-block:: python from mythos.utils.scheduler import SchedulerHints simulator = oxdna.oxDNASimulator( ..., scheduler_hints=SchedulerHints( num_cpus=4, num_gpus=1, # reserve 1 GPU for this task mem_mb=8192, ), ) Fractional GPU sharing ^^^^^^^^^^^^^^^^^^^^^^ If your simulations are small enough that multiple can share a single GPU, use fractional values: .. code-block:: python scheduler_hints=SchedulerHints( num_gpus=0.5, # two tasks can share one GPU ) Ray will schedule up to two tasks with ``num_gpus=0.5`` on the same GPU. .. note:: Fractional GPU sharing relies on tasks fitting within GPU memory simultaneously. If tasks exceed the GPU's memory when co-scheduled, you will see CUDA out-of-memory errors. Use ``num_gpus=1`` to guarantee exclusive GPU access per task. For full details on scheduler hints, including ``mem_mb``, ``max_retries``, and ``custom`` options, see the :doc:`ray_optimizer` page. Slurm GPU Partitions --------------------- When running on an HPC cluster with GPU nodes, request a GPU partition and allocate GPUs in your sbatch script: .. code-block:: bash #SBATCH --partition=gpu #SBATCH --gres=gpu:1 #SBATCH --tasks-per-node=1 #SBATCH --cpus-per-task=8 Ensure that the CUDA toolkit is loaded and that your ``scheduler_hints`` match the number of GPUs allocated per node. See :doc:`slurm` for the full Slurm setup guide.