1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275
|
Perfetto Tracing
================
Mesa has experimental support for `Perfetto <https://perfetto.dev>`__ for
GPU performance monitoring. Perfetto supports multiple
`producers <https://perfetto.dev/docs/concepts/service-model>`__ each with
one or more data-sources. Perfetto already provides various producers and
data-sources for things like:
- CPU scheduling events (``linux.ftrace``)
- CPU frequency scaling (``linux.ftrace``)
- System calls (``linux.ftrace``)
- Process memory utilization (``linux.process_stats``)
As well as various domain specific producers.
The mesa Perfetto support adds additional producers, to allow for visualizing
GPU performance (frequency, utilization, performance counters, etc) on the
same timeline, to better understand and tune/debug system level performance:
- pps-producer: A systemwide daemon that can collect global performance
counters.
- mesa: Per-process producer within mesa to capture render-stage traces
on the GPU timeline, track events on the CPU timeline, etc.
The exact supported features vary per driver:
.. list-table:: Supported data-sources
:header-rows: 1
* - Driver
- PPS Counters
- Render Stages
* - Freedreno
- ``gpu.counters.msm``
- ``gpu.renderstages.msm``
* - Turnip
- ``gpu.counters.msm``
- ``gpu.renderstages.msm``
* - Intel
- ``gpu.counters.i915``
- ``gpu.renderstages.intel``
* - Panfrost
- ``gpu.counters.panfrost``
-
* - V3D
- ``gpu.counters.v3d``
-
* - V3DV
- ``gpu.counters.v3d``
-
Run
---
To capture a trace with Perfetto you need to take the following steps:
1. Build Perfetto from sources available at ``subprojects/perfetto`` following
`this guide <https://perfetto.dev/docs/quickstart/linux-tracing>`__.
2. Create a `trace config <https://perfetto.dev/docs/concepts/config>`__, which is
a json formatted text file with extension ``.cfg``, or use one of the config
files under the ``src/tool/pps/cfg`` directory. More examples of config files
can be found in ``subprojects/perfetto/test/configs``.
3. Change directory to ``subprojects/perfetto`` and run a
`convenience script <https://perfetto.dev/docs/getting-started/system-tracing#recording-your-first-system-trace>`__
to start the tracing service:
.. code-block:: sh
cd subprojects/perfetto
CONFIG=<path/to/gpu.cfg> OUT=out/linux_clang_release ./tools/tmux -n
4. Start other producers you may need, e.g. ``pps-producer``.
5. Start ``perfetto`` under the tmux session initiated in step 3.
6. Once tracing has finished, you can detach from tmux with :kbd:`Ctrl+b`,
:kbd:`d`, and the convenience script should automatically copy the trace
files into ``$HOME/Downloads``.
7. Go to `ui.perfetto.dev <https://ui.perfetto.dev>`__ and upload
``$HOME/Downloads/trace.protobuf`` by clicking on **Open trace file**.
8. Alternatively you can open the trace in `AGI <https://gpuinspector.dev/>`__
(which despite the name can be used to view non-android traces).
To be a bit more explicit, here is a listing of commands reproducing
the steps above :
.. code-block:: sh
# Configure Mesa with perfetto
mesa $ meson . build -Dperfetto=true -Dvulkan-drivers=intel,broadcom -Dgallium-drivers=
# Build mesa
mesa $ meson compile -C build
# Within the Mesa repo, build perfetto
mesa $ cd subprojects/perfetto
perfetto $ ./tools/install-build-deps
perfetto $ ./tools/gn gen --args='is_debug=false' out/linux
perfetto $ ./tools/ninja -C out/linux
# Start perfetto
perfetto $ CONFIG=../../src/tool/pps/cfg/gpu.cfg OUT=out/linux/ ./tools/tmux -n
# In parallel from the Mesa repo, start the PPS producer
mesa $ ./build/src/tool/pps/pps-producer
# Back in the perfetto tmux, press enter to start the capture
CPU Tracing
~~~~~~~~~~~
Mesa's CPU tracepoints (``MESA_TRACE_*``) use Perfetto track events when
Perfetto is enabled. They use ``mesa.default`` and ``mesa.slow`` categories.
Currently, only EGL and the following drivers have CPU tracepoints.
- Freedreno
- Panfrost
- Turnip
- V3D
- VC4
- V3DV
Vulkan data sources
~~~~~~~~~~~~~~~~~~~
The Vulkan API gives the application control over recording of command
buffers as well as when they are submitted to the hardware. As a
consequence, we need to ensure command buffers are properly
instrumented for the Perfetto driver data sources prior to Perfetto
actually collecting traces.
This can be achieved by setting the :envvar:`MESA_GPU_TRACES`
environment variable before starting a Vulkan application :
.. code-block:: sh
MESA_GPU_TRACES=perfetto ./build/my_vulkan_app
Driver Specifics
~~~~~~~~~~~~~~~~
Below is driver specific information/instructions for the PPS producer.
Freedreno / Turnip
^^^^^^^^^^^^^^^^^^
The Freedreno PPS driver needs root access to read system-wide
performance counters, so you can simply run it with sudo:
.. code-block:: sh
sudo ./build/src/tool/pps/pps-producer
Intel
^^^^^
The Intel PPS driver needs root access to read system-wide
`RenderBasic <https://www.intel.com/content/www/us/en/docs/vtune-profiler/user-guide/2023-0/gpu-metrics-reference.html>`__
performance counters, so you can simply run it with sudo:
.. code-block:: sh
sudo ./build/src/tool/pps/pps-producer
Another option to enable access wide data without root permissions would be running the following:
.. code-block:: sh
sudo sysctl dev.i915.perf_stream_paranoid=0
Alternatively using the ``CAP_PERFMON`` permission on the binary should work too.
A particular metric set can also be selected to capture a different
set of HW counters :
.. code-block:: sh
INTEL_PERFETTO_METRIC_SET=RasterizerAndPixelBackend ./build/src/tool/pps/pps-producer
Vulkan applications can also be instrumented to be Perfetto producers.
To enable this for given application, set the environment variable as
follow :
.. code-block:: sh
PERFETTO_TRACE=1 my_vulkan_app
Panfrost
^^^^^^^^
The Panfrost PPS driver uses unstable ioctls that behave correctly on
kernel version `5.4.23+ <https://lwn.net/Articles/813601/>`__ and
`5.5.7+ <https://lwn.net/Articles/813600/>`__.
To run the producer, follow these two simple steps:
1. Enable Panfrost unstable ioctls via kernel parameter:
.. code-block:: sh
modprobe panfrost unstable_ioctls=1
Alternatively you could add ``panfrost.unstable_ioctls=1`` to your kernel command line, or ``echo 1 > /sys/module/panfrost/parameters/unstable_ioctls``.
2. Run the producer:
.. code-block:: sh
./build/pps-producer
V3D / V3DV
----------
As we can only have one performance monitor active at a given time, we can only monitor
32 performance counters. There is a need to define the performance counters of interest
for pps_producer using the environment variable ``V3D_DS_COUNTER``.
.. code-block:: sh
V3D_DS_COUNTER=cycle-count,CLE-bin-thread-active-cycles,CLE-render-thread-active-cycles,QPU-total-uniform-cache-hit ./src/tool/pps/pps-producer
Troubleshooting
---------------
Tmux
~~~~
If the convenience script ``tools/tmux`` keeps copying artifacts to your
``SSH_TARGET`` without starting the tmux session, make sure you have ``tmux``
installed in your system.
.. code-block:: sh
apt install tmux
Missing counter names
~~~~~~~~~~~~~~~~~~~~~
If the trace viewer shows a list of counters with a description like
``gpu_counter(#)`` instead of their proper names, maybe you had a data loss due
to the trace buffer being full and wrapped.
In order to prevent this loss of data you can tweak the trace config file in
two different ways:
- Increase the size of the buffer in use:
.. code-block:: javascript
buffers {
size_kb: 2048,
fill_policy: RING_BUFFER,
}
- Periodically flush the trace buffer into the output file:
.. code-block:: javascript
write_into_file: true
file_write_period_ms: 250
- Discard new traces when the buffer fills:
.. code-block:: javascript
buffers {
size_kb: 2048,
fill_policy: DISCARD,
}
|