7.5. Running on specific systems

This section discusses how to run mirgecom on various clusters. There are also several example run scripts in mirgecom’s examples/ folder.

7.5.1. General

In general, we recommend running mirgecom with 1 MPI rank (= python process) per cluster node. For GPU execution, we recommend running with 1 MPI rank per GPU. Kernel execution will be parallelized automatically through pocl (either on CPU or GPU, depending on the options you selected and what is available on the system).

7.5.2. Quartz

On the Quartz machine, running mirgecom should be straightforward. An example batch script for the slurm batch system is given below:

#!/bin/bash
#SBATCH -N 2                        # number of nodes
#SBATCH -t 00:30:00                 # walltime (hh:mm:ss)
#SBATCH -p pbatch                   # queue to use

# Run this script with 'sbatch quartz.sbatch.sh'

# Put any environment activation here, e.g.:
# source ../../config/activate_env.sh

# OpenCL device selection:
# export PYOPENCL_CTX="port:pthread"  # Run on CPU with pocl

nnodes=$SLURM_JOB_NUM_NODES
nproc=$nnodes # 1 rank per node

echo nnodes=$nnodes nproc=$nproc

# See
# https://mirgecom.readthedocs.io/en/latest/running.html#avoiding-overheads-due-to-caching-of-kernels
# on why this is important
export XDG_CACHE_HOME="/tmp/$USER/xdg-scratch"

# Run application
# -O: switch on optimizations
srun -n $nproc python -O -m mpi4py ./vortex-mpi.py

Run this with sbatch <script.sh>.

More information about Quartz can be found here:

7.5.3. Lassen

On Lassen, we recommend running 1 MPI rank per GPU on each node. Care must be taken to restrict each rank to a separate GPU to avoid competing for access to the GPU. The easiest way to do this is by specifying the -g 1 argument to lrun. An example batch script for the LSF batch system is given below:

#!/bin/bash
#BSUB -nnodes 4                   # number of nodes
#BSUB -W 30                       # walltime in minutes
#BSUB -q pbatch                   # queue to use

# Run this script with 'bsub lassen.bsub.sh'

# Put any environment activation here, e.g.:
# source ../../config/activate_env.sh

# OpenCL device selection:
export PYOPENCL_CTX="port:tesla"      # Run on Nvidia GPU with pocl
# export PYOPENCL_CTX="port:pthread"  # Run on CPU with pocl

nnodes=$(echo $LSB_MCPU_HOSTS | wc -w)
nnodes=$((nnodes/2-1))
nproc=$((4*nnodes)) # 4 ranks per node, 1 per GPU

echo nnodes=$nnodes nproc=$nproc

# -a 1: 1 task per resource set
# -g 1: 1 GPU per resource set
# -n $nproc: $nproc resource sets
jsrun_cmd="jsrun -g 1 -a 1 -n $nproc"

# See
# https://mirgecom.readthedocs.io/en/latest/running.html#avoiding-overheads-due-to-caching-of-kernels
# on why this is important
export XDG_CACHE_HOME_ROOT="/tmp/$USER/xdg-scratch/rank"

# Fixes https://github.com/illinois-ceesd/mirgecom/issues/292
# (each rank needs its own POCL cache dir)
export POCL_CACHE_DIR_ROOT="/tmp/$USER/pocl-cache/rank"

# Print task allocation
$jsrun_cmd js_task_info

echo "----------------------------"

# Run application
# -O: switch on optimizations
# POCL_CACHE_DIR=...: each rank needs its own POCL cache dir
# XDG_CACHE_HOME=...: each rank needs its own Loopy/PyOpenCL cache dir
$jsrun_cmd bash -c 'POCL_CACHE_DIR=$POCL_CACHE_DIR_ROOT$OMPI_COMM_WORLD_RANK XDG_CACHE_HOME=$XDG_CACHE_HOME_ROOT$OMPI_COMM_WORLD_RANK python -O -m mpi4py ./pulse-mpi.py'

Run this with bsub <script.sh>.

More information about Lassen can be found here: