File: profiling.md

package info (click to toggle)
vulkan-validationlayers 1.4.321.0-1
links: PTS, VCS
area: main
in suites: forky, sid
size: 47,412 kB
sloc: cpp: 594,175; python: 11,321; sh: 24; makefile: 20; xml: 14
file content (48 lines) | stat: -rw-r--r-- 3,496 bytes
parent folder | download | duplicates (6)
# Profiling VVL

The [Tracy](https://github.com/wolfpld/tracy) profiler has been setup. Get the doc [here](https://github.com/wolfpld/tracy/releases/latest/download/tracy.pdf).

Used Tracy version: 0.12.1 (alpha)

Relevant CMake options:
- `-D VVL_ENABLE_TRACY` Enable Tracy
- `-D VVL_ENABLE_TRACY_CPU_MEMORY` Enable Tracy CPU memory profiling
- `-D VVL_TRACY_CALLSTACK=<N>` Define maximum collected call stack size (default: 48)
- `-D VVL_ENABLE_TRACY_GPU` Enable GPU profiling: (only retrieves timings for draw, dispatch and trace rays commands)

⚠️ Having a big call stack size can have a noticeable impact on performance

⚠️ Make sure your the various dependencies are compiled with the same optimisations levels and debug as VVL, otherwise expect crashes in `TracyAlloc/Free`

- To enable retrieving data from kernel facilities, for instance to have fine grained info on CPU usage by performing sampling, run the application VVL is injected into with elevated privileges. If you use VkConfig to enable VVL, do not forget to also launch it with elevated privileges.
⚠️ On windows, having other application running with elevated privileges can cause Tracy sampling to fail.

## Limitations

- You may notice a stall when shutting down the profiled application: have a look at the profiler, the "query backlog" (satellite icon, around top right) is probably being emptied. It can take some time.

- Meant to be used with applications that do not live for only a small amount of time, and create only one `VkInstance`.
=> So for now, forget about profiling our test suite...

- Manual profiler lifetimes are used: profiler is started when the VVL shared library is loaded, and it is shut down at `vkDestoyInstance` time.
=> It is thus assumed that application only create one instance, as of writing it appears to be the case in most applications.

- CPU memory profiling cannot be used with Mimalloc. It needs to be setup, a quick stab at it showed that it is blowing up Tracy.

## GPU profiling 

- Implementation implicitly relies on the following Vulkan features:
VK_EXT_calibrated_timestamps
VK_EXT_host_query_reset
vkGetPhysicalDeviceCalibrateableTimeDomainsEXT returning VK_TIME_DOMAIN_CLOCK_MONOTONIC_RAW_EXT on Windows and VK_TIME_DOMAIN_QUERY_PERFORMANCE_COUNTER on Unix

Those features seem to be present on all drivers used by the main VVL developpers.

⚠️ Required features are added by hacking into `ValidationStateTracker::PreCallRecordCreateDevice`, so profiling can only work if `ValidationStateTracker` is somehow instantiated.

As of writing, the Tracy GPU profiling implementations only allows to trace individual commands, and expects every profiled commands to actually be submitted, otherwise due to the current query collecting algorithm queries made after un-submitted commands will never be seen. 
Profiling is supposed to work on any applications, and some do have this failing case. 
To cope, a custom Tracy fork is now used, with new features allowing to profile at queue submit level: for every `vkQueueSubmit`, a timestamp is inserted before and after its submissions. This way only actually executed GPU workloads will be profiled, bypassing the previously mentioned limitation.
For now, this coarse profiling level is enough.

When looking at the `Statistics > GPU` window of a trace, you may noticed that some action submits are missing. It just means that by the time application shuts down, not all GPU time stamp queries have been retrieved.