7.3. Profiling and logging¶
7.3.1. Profiling memory consumption with memray
¶
MIRGE-Com automatically tracks overall memory consumption on host and
devices via logpyle
, but memray
can be used to gain a finer-grained
understanding of how much memory is allocated in which parts of the code.
MIRGE-Com allocates two types of memory during execution:
Python host memory for
numpy
data, Python lists and dicts, etc. This memory is always heap-allocated viamalloc()
calls.OpenCL device memory for the mesh, etc. At the time of this writing, this memory is by default allocated via OpenCL’s Shared Virtual Memory (SVM) mechanism and uses a
pyopencl
memory pool. When running withpocl
on the CPU, the SVM memory is allocated viamalloc()
calls. When running withpocl
on Nvidia GPUs, the SVM memory is allocated using CUDA’s managed (unified) memory, via cuMemAllocManaged().
After installing memray
(via e.g. $ conda install memray
), memory
consumption can be profiled on Linux or MacOS in the following way:
# Collect the trace:
$ python -m memray run --native -m mpi4py examples/wave.py --lazy
[...]
# Create a flamegraph HTML
$ python -m memray flamegraph memray-wave.py.44955.bin
[...]
# Open the HTML file
$ open memray-flamegraph-wave.44955.html
Note
The flamegraph analysis (as well as other analysis tools) needs to be run on the same system where the trace was collected, as it needs access to the symbols from the machine’s binaries. The resulting HTML files can be opened on any system.
Note
Although tracing the allocations has a low performance overhead, the resulting
trace files and flamegraphs can reach sizes of hundreds of MBytes.
memray
releases after 1.6.0 will include an option (--aggregate
) to
reduce the sizes of these files.
Warning
For the reasons outlined in the next subsection, we highly recommend running the analysis when running on CPUs, not GPUs.
7.3.1.1. Common issues¶
Incorrectly low memory consumption when running with pocl-cuda on GPUs
When running with pocl-cuda on Nvidia GPUs, the memory consumption will appear to be much lower than when running the same analysis on the CPU. The reason for this is that we use unified memory on Nvidia GPUs, in which case the SVM memory allocations will not be counted against the running application, but against the CUDA driver and runtime, thus hiding the memory consumption from tools such as
ps
ormemray
. The overall consumption can still be estimated by looking at the system memory via e.g.free
.High virtual memory consumption with an installed pocl-cuda
When pocl-cuda initializes, it consumes a large amount of virtual memory (~100 GByte) just due to the initialization. To make the output of memray easier to understand (e.g., memray sizes the flamegraph according to virtual memory consumed), we recommend disabling or uninstalling pocl-cuda for profiling memory consumption, via e.g.
$ conda uninstall pocl-cuda
.
7.3.2. Profiling kernel execution¶
You can use mirgecom.profiling.PyOpenCLProfilingArrayContext
instead of
PyOpenCLArrayContext
to profile kernel executions.
In addition to using this array context, you also need to enable profiling in the
underlying pyopencl.CommandQueue
, like this:
queue = cl.CommandQueue(cl_ctx,
properties=cl.command_queue_properties.PROFILING_ENABLE)
Note that profiling has a performance impact (~20% at the time of this writing).
- class mirgecom.profiling.PyOpenCLProfilingArrayContext(queue, allocator=None, logmgr=None)[source]¶
An array context that profiles OpenCL kernel executions.
- Parameters:
logmgr (LogManager | None)
- tabulate_profiling_data()[source]¶
Return a
pytools.Table
with the profiling results.- Return type:
- get_profiling_data_for_kernel(kernel_name)[source]¶
Return profiling data for kernel kernel_name.
- Parameters:
kernel_name (str)
- Return type:
- reset_profiling_data_for_kernel(kernel_name)[source]¶
Reset profiling data for kernel kernel_name.
- Parameters:
kernel_name (str)
- Return type:
None
Inherits from
arraycontext.PyOpenCLArrayContext
.Note
Profiling of
pyopencl
kernels (that is, kernels that do not get called throughcall_loopy()
) is restricted to a single instance of this class. If there are multiple instances, only the first one created will be able to profile these kernels.
- class mirgecom.profiling.SingleCallKernelProfile(time, flops, bytes_accessed, footprint_bytes)[source]¶
Class to hold the results of a single kernel execution.
- class mirgecom.profiling.MultiCallKernelProfile(num_calls, time, flops, bytes_accessed, footprint_bytes)[source]¶
Class to hold the results of multiple kernel executions.
- Parameters:
num_calls (int)
time (StatisticsAccumulator)
flops (StatisticsAccumulator)
bytes_accessed (StatisticsAccumulator)
footprint_bytes (StatisticsAccumulator)
7.3.3. Time series logging¶
Mirgecom supports logging of simulation and profiling quantities with the help
of logpyle
. Logpyle requires
classes to describe how quantities for logging are calculated. For MIRGE-Com, these
classes are described below.
- class mirgecom.logging_quantities.StateConsumer(extract_vars_for_logging)[source]¶
Base class for quantities that require a state for logging.
- Parameters:
extract_vars_for_logging (Callable)
- __init__(extract_vars_for_logging)[source]¶
Store the function to extract state variables.
- Parameters:
extract_vars_for_logging(dim – Returns a dict(quantity_name: values) of the state vars for a particular state.
state – Returns a dict(quantity_name: values) of the state vars for a particular state.
eos) – Returns a dict(quantity_name: values) of the state vars for a particular state.
extract_vars_for_logging (Callable)
- class mirgecom.logging_quantities.DiscretizationBasedQuantity(dcoll, quantity, op, extract_vars_for_logging, units_logging, name=None, axis=None, dd=DOFDesc(domain_tag=VolumeDomainTag(tag=<class 'grudge.dof_desc.VTAG_ALL'>), discretization_tag=<class 'grudge.dof_desc.DISCR_TAG_BASE'>))[source]¶
Logging support for physical quantities.
Possible rank aggregation operations (
op
) are: min, max, L2_norm.- Parameters:
dcoll (DiscretizationCollection)
quantity (str)
op (str)
name (str | None)
axis (int | None)
- class mirgecom.logging_quantities.KernelProfile(actx, kernel_name)[source]¶
Logging support for statistics of the OpenCL kernel profiling (num_calls, time, flops, bytes_accessed, footprint).
All statistics except num_calls are averages.
- Parameters:
actx (PyOpenCLArrayContext) – The array context from which to collect statistics. Must have profiling enabled in the OpenCL command queue.
kernel_name (str) – Name of the kernel to profile.
- class mirgecom.logging_quantities.PythonMemoryUsage(name=None)[source]¶
Logging support for Python memory usage (RSS, host).
Uses
psutil
to track memory usage. Virtually no overhead.- Parameters:
name (str | None)
- class mirgecom.logging_quantities.DeviceMemoryUsage(name=None)[source]¶
Logging support for GPU memory usage (Nvidia only currently).
- Parameters:
name (str | None)
- mirgecom.logging_quantities.initialize_logmgr(enable_logmgr, filename=None, mode='wu', mpi_comm=None)[source]¶
Create and initialize a mirgecom-specific
logpyle.LogManager
.- Parameters:
- Return type:
LogManager | None
- mirgecom.logging_quantities.logmgr_add_cl_device_info(logmgr, queue)[source]¶
Add information about the OpenCL device to the log.
- Parameters:
logmgr (LogManager)
queue (CommandQueue)
- Return type:
None
- mirgecom.logging_quantities.logmgr_add_device_memory_usage(logmgr, queue)[source]¶
Add the OpenCL device memory usage to the log.
- Parameters:
logmgr (LogManager)
queue (CommandQueue)
- Return type:
None
- mirgecom.logging_quantities.logmgr_add_many_discretization_quantities(logmgr, dcoll, dim, extract_vars_for_logging, units_for_logging, dd=DOFDesc(domain_tag=VolumeDomainTag(tag=<class 'grudge.dof_desc.VTAG_ALL'>), discretization_tag=<class 'grudge.dof_desc.DISCR_TAG_BASE'>))[source]¶
Add default discretization quantities to the logmgr.
- Parameters:
logmgr (LogManager)
- Return type:
None
- mirgecom.logging_quantities.logmgr_add_mempool_usage(logmgr, pool)[source]¶
Add the memory pool usage to the log.
- Parameters:
logmgr (LogManager)
pool (MemoryPool | SVMPool)
- Return type:
None
- mirgecom.logging_quantities.add_package_versions(mgr, path_to_version_sh=None)[source]¶
Add the output of the emirge version.sh script to the log.
- Parameters:
mgr (LogManager) – The
logpyle.LogManager
to add the versions to.path_to_version_sh (str | None) – Path to emirge’s version.sh script. The function will attempt to find this script automatically if this argument is not specified.
- Return type:
None
- mirgecom.logging_quantities.set_sim_state(mgr, dim, state, eos)[source]¶
Update the simulation state of all
StateConsumer
of the log manager.- Parameters:
mgr (LogManager) – The
logpyle.LogManager
whoseStateConsumer
quantities will receive state.- Return type:
None
- mirgecom.logging_quantities.logmgr_set_time(mgr, steps, time)[source]¶
Set the (current/initial) time/step count explicitly (e.g., for restart).
- Parameters:
mgr (LogManager)
steps (int)
time (float)
- Return type:
None
An overview of how to use logpyle is given in the Logpyle documentation
.