7.6. OpenCL kernel caching

OpenCL kernels are cached in memory and on hard disk on multiple levels during a MIRGE-Com execution. This has the advantage of reducing the compilation time of kernels when running the same driver multiple times.

The following sections discuss MIRGE-Com-related packages that use caching, with a focus on configuring the disk-based caching.

Note

The following bash code can be used to remove all disk caches used by MIRGE-Com on Linux and MacOS:

$ rm -rf $XDG_CACHE_HOME/pytools/pdict* ~/.cache/pytools/pdict* ~/Library/Caches/pytools/pdict*  $XDG_CACHE_HOME/pyopencl ~/.cache/pyopencl  ~/Library/Caches/pyopencl $POCL_CACHE_DIR $XDG_CACHE_HOME/pocl ~/.cache/pocl ~/.nv/ComputeCache $CUDA_CACHE_PATH

Note

The following bash code can be used to disable all disk caches:

$ export LOOPY_NO_CACHE=1
$ export PYOPENCL_NO_CACHE=1
$ export POCL_KERNEL_CACHE=0
$ export CUDA_CACHE_DISABLE=1

Note

Disabling disk caching for a specific package only affects that particular package. For example, disabling disk caching for loopy does not affect the caching behavior of pyopencl or PoCL.

7.6.1. Loopy

loopy stores the source of generated PyOpenCL kernels and their invokers in $XDG_CACHE_HOME/pytools/pdict-*-loopy by default. You can export LOOPY_NO_CACHE=1 to disable caching. See here for details.

Note

loopy uses pytools.persistent_dict.PersistentDict for caching. PersistentDict also keeps an in-memory cache.

Note

When $XDG_CACHE_HOME is not set, the cache dir defaults to ~/.cache on Linux and ~/Library/Caches/ on MacOS.

7.6.2. PyOpenCL

pyopencl caches in $XDG_CACHE_HOME/pyopencl (kernel source code and binaries returned by the OpenCL runtime) and $XDG_CACHE_HOME/pytools/pdict-*-pyopencl (invokers, generated source code) by default. You can export PYOPENCL_NO_CACHE=1 to disable caching. See here for details.

Note

PyOpenCL does not cache kernel binaries in memory by default. To keep the compiled version of a kernel in memory, simply retain the pyopencl.Program or pyopencl.Kernel objects. Loopy’s loopy.LoopKernel already holds handles to compiled pyopencl.Kernel objects.

Note

PyOpenCL uses clCreateProgramWithSource on the first compilation and caches the OpenCL binary it retrieves. The second time the same source is compiled, it uses clCreateProgramWithBinary to hand the binary to the CL runtime (such as PoCL). This can lead to different caching behaviors on the first three compilations depending on how the CL runtime itself performs caching.

7.6.3. PoCL

PoCL stores compilation results (LLVM bitcode and shared libraries) in $POCL_CACHE_DIR or $XDG_CACHE_HOME/pocl by default. You can export POCL_KERNEL_CACHE=0 to disable caching. See here for details.

Note

When $POCL_CACHE_DIR and $XDG_CACHE_HOME are not set, PoCL’s cache dir defaults to ~/.cache/pocl on Linux and MacOS.

7.6.4. CUDA

CUDA stores binary kernels in ~/.nv/ComputeCache (on Linux only, we do not support CUDA devices on MacOS) by default. You can export CUDA_CACHE_DISABLE=1 to disable caching, and select a different cache directory with CUDA_CACHE_PATH. See here for details.

Warning

The CUDA JIT cache is disabled by default on Lassen, i.e., CUDA_CACHE_DISABLE=1 is set by default. Source: email by J. Gyllenhaal on 03/12/2020.