File: anv.rst

package info (click to toggle)
mesa 26.0.1-2
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 326,172 kB
  • sloc: ansic: 2,260,907; xml: 1,035,283; cpp: 528,081; python: 83,456; asm: 40,568; yacc: 12,040; lisp: 3,663; lex: 3,461; sh: 1,035; makefile: 223
file content (368 lines) | stat: -rw-r--r-- 13,772 bytes parent folder | download | duplicates (8)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
ANV
===

Experimental features
---------------------

.. _`Bindless model`:

Binding Model
-------------

Here is the ANV bindless binding model that was implemented for the
descriptor indexing feature of Vulkan 1.2 :

.. graphviz::

  digraph G {
    fontcolor="black";
    compound=true;

    subgraph cluster_1 {
      label = "Binding Table (HW)";

      bgcolor="cornflowerblue";

      node [ style=filled,shape="record",fillcolor="white",
             label="RT0"    ] n0;
      node [ label="RT1"    ] n1;
      node [ label="dynbuf0"] n2;
      node [ label="set0"   ] n3;
      node [ label="set1"   ] n4;
      node [ label="set2"   ] n5;

      n0 -> n1 -> n2 -> n3 -> n4 -> n5 [style=invis];
    }
    subgraph cluster_2 {
      label = "Descriptor Set 0";

      bgcolor="burlywood3";
      fixedsize = true;

      node [ style=filled,shape="record",fillcolor="white", fixedsize = true, width=4,
             label="binding 0 - STORAGE_IMAGE\n anv_storage_image_descriptor"          ] n8;
      node [ label="binding 1 - COMBINED_IMAGE_SAMPLER\n anv_sampled_image_descriptor" ] n9;
      node [ label="binding 2 - UNIFORM_BUFFER\n anv_address_range_descriptor"         ] n10;
      node [ label="binding 3 - UNIFORM_TEXEL_BUFFER\n anv_storage_image_descriptor"   ] n11;

      n8 -> n9 -> n10 -> n11 [style=invis];
    }
    subgraph cluster_5 {
      label = "Vulkan Objects"

      fontcolor="black";
      bgcolor="darkolivegreen4";

      subgraph cluster_6 {
        label = "VkImageView";

        bgcolor=darkolivegreen3;
        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
               label="surface_state" ] n12;
      }
      subgraph cluster_7 {
        label = "VkSampler";

        bgcolor=darkolivegreen3;
        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
               label="sample_state" ] n13;
      }
      subgraph cluster_8 {
        label = "VkImageView";
        bgcolor="darkolivegreen3";

        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
               label="surface_state" ] n14;
      }
      subgraph cluster_9 {
        label = "VkBuffer";
        bgcolor=darkolivegreen3;

        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
               label="address" ] n15;
      }
      subgraph cluster_10 {
        label = "VkBufferView";

        bgcolor=darkolivegreen3;
        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
               label="surface_state" ] n16;
      }

      n12 -> n13 -> n14 -> n15 -> n16 [style=invis];
    }

    subgraph cluster_11 {
      subgraph cluster_12 {
        label = "CommandBuffer state stream";

        bgcolor="gold3";
        node [ style=filled,shape="box",fillcolor="white", fixedsize = true, width=2,
               label="surface_state" ] n17;
        node [ label="surface_state" ] n18;
        node [ label="surface_state" ] n19;

        n17 -> n18 -> n19 [style=invis];
      }
    }

    n3  -> n8 [lhead=cluster_2];

    n8  -> n12;
    n9  -> n13;
    n9  -> n14;
    n10 -> n15;
    n11 -> n16;

    n0 -> n17;
    n1 -> n18;
    n2 -> n19;
  }



The HW binding table is generated when the draw or dispatch commands
are emitted. Here are the types of entries one can find in the binding
table :

- The currently bound descriptor sets, one entry per descriptor set
  (our limit is 8).

- For dynamic buffers, one entry per dynamic buffer.

- For draw commands, render target entries if needed.

The entries of the HW binding table for descriptor sets are
RENDER_SURFACE_STATE similar to what you would have for a normal
uniform buffer. The shader will emit reads this buffer first to get
the information it needs to access a surface/sampler/etc... and then
emits the appropriate message using the information gathered from the
descriptor set buffer.

Each binding type entry gets an associated structure in memory
(``anv_storage_image_descriptor``, ``anv_sampled_image_descriptor``,
``anv_address_range_descriptor``, ``anv_storage_image_descriptor``).
This is the information read by the shader.


.. _`Binding tables`:

Binding Tables
--------------

Binding tables are arrays of 32bit offset entries referencing surface
states. This is how shaders can refer to binding table entry to read
or write a surface. For example fragment shaders will often refer to
entry 0 as the first render target.

The way binding tables are managed is fairly awkward.

Each shader stage must have its binding table programmed through
a corresponding instruction
``3DSTATE_BINDING_TABLE_POINTERS_*`` (each stage has its own).

.. graphviz::

  digraph structs {
    node [shape=record];
    struct3 [label="{ binding tables&#92;n area | { <bt4> BT4 | <bt3> BT3 | ... | <bt0> BT0 } }|{ surface state&#92;n area |{<ss0> ss0|<ss1> ss1|<ss2> ss2|...}}"];
    struct3:bt0 -> struct3:ss0;
    struct3:bt0 -> struct3:ss1;
  }


The value programmed in the ``3DSTATE_BINDING_TABLE_POINTERS_*``
instructions is not a 64bit pointer but an offset from the address
programmed in ``STATE_BASE_ADDRESS::Surface State Base Address`` or
``3DSTATE_BINDING_TABLE_POOL_ALLOC::Binding Table Pool Base Address``
(available on Gfx11+). The offset value in
``3DSTATE_BINDING_TABLE_POINTERS_*`` is also limited to a few bits
(not a full 32bit value), meaning that as we use more and more binding
tables we need to reposition ``STATE_BASE_ADDRESS::Surface State Base
Address`` to make space for new binding table arrays.

To make things even more awkward, the binding table entries are also
relative to ``STATE_BASE_ADDRESS::Surface State Base Address`` so as
we change ``STATE_BASE_ADDRESS::Surface State Base Address`` we need
add that offsets to the binding table entries.

The way with deal with this is that we allocate 4Gb of address space
(since the binding table entries can address 4Gb of surface state
elements). We reserve the first gigabyte exclusively to binding
tables, so that anywhere we position our binding table in that first
gigabyte, it can always refer to the surface states in the next 3Gb.


.. _`Descriptor Set Memory Layout`:

Descriptor Set Memory Layout
----------------------------

Here is a representation of how the descriptor set bindings, with each
elements in each binding is mapped to a the descriptor set memory :

.. graphviz::

  digraph structs {
    node [shape=record];
    rankdir=LR;

    struct1 [label="Descriptor Set | \
                    <b0> binding 0\n STORAGE_IMAGE \n (array_length=3) | \
                    <b1> binding 1\n COMBINED_IMAGE_SAMPLER \n (array_length=2) | \
                    <b2> binding 2\n UNIFORM_BUFFER \n (array_length=1) | \
                    <b3> binding 3\n UNIFORM_TEXEL_BUFFER \n (array_length=1)"];
    struct2 [label="Descriptor Set Memory | \
                    <b0e0> anv_storage_image_descriptor|\
                    <b0e1> anv_storage_image_descriptor|\
                    <b0e2> anv_storage_image_descriptor|\
                    <b1e0> anv_sampled_image_descriptor|\
                    <b1e1> anv_sampled_image_descriptor|\
                    <b2e0> anv_address_range_descriptor|\
                    <b3e0> anv_storage_image_descriptor"];

    struct1:b0 -> struct2:b0e0;
    struct1:b0 -> struct2:b0e1;
    struct1:b0 -> struct2:b0e2;
    struct1:b1 -> struct2:b1e0;
    struct1:b1 -> struct2:b1e1;
    struct1:b2 -> struct2:b2e0;
    struct1:b3 -> struct2:b3e0;
  }

Each Binding in the descriptor set is allocated an array of
``anv_*_descriptor`` data structure. The type of ``anv_*_descriptor``
used for a binding is selected based on the ``VkDescriptorType`` of
the bindings.

The value of ``anv_descriptor_set_binding_layout::descriptor_offset``
is a byte offset from the descriptor set memory to the associated
binding. ``anv_descriptor_set_binding_layout::array_size`` is the
number of ``anv_*_descriptor`` elements in the descriptor set memory
from that offset for the binding.


Pipeline state emission
-----------------------

Vulkan initially started by baking as much state as possible in
pipelines. But extension after extension, more and more state has
become potentially dynamic.

ANV tries to limit the amount of time an instruction has to be packed
to reprogram part of the 3D pipeline state. The packing is happening
in 2 places :

- ``genX_pipeline.c`` where the non dynamic state is emitted in the
  pipeline batch. Chunks of the batches are copied into the command
  buffer as a result of calling ``vkCmdBindPipeline()``, depending on
  what changes from the previously bound graphics pipeline

- ``genX_gfx_state.c`` where the dynamic state is added to already
  packed instructions from ``genX_pipeline.c``

The rule to know where to emit an instruction programming the 3D
pipeline is as follow :

- If any field of the instruction can be made dynamic, it should be
  emitted in ``genX_gfx_state.c``

- Otherwise, the instruction can be emitted in ``genX_pipeline.c``

When a piece of state programming is dynamic, it should have a
corresponding field in ``anv_gfx_dynamic_state`` and the
``genX(cmd_buffer_flush_gfx_runtime_state)`` function should be
updated to ensure we minimize the amount of time an instruction should
be emitted. Each instruction should have a associated
``ANV_GFX_STATE_*`` mask so that the dynamic emission code can tell
when to re-emit an instruction.


Generated indirect draws optimization
-------------------------------------

Indirect draws have traditionally been implemented on Intel HW by
loading the indirect parameters from memory into HW registers using
the command streamer's ``MI_LOAD_REGISTER_MEM`` instruction before
dispatching a draw call to the 3D pipeline.

On recent products, it was found that the command streamer is showing
as performance bottleneck, because it cannot dispatch draw calls fast
enough to keep the 3D pipeline busy.

The solution to this problem is to change the way we deal with
indirect draws. Instead of loading HW registers with values using the
command streamer, we generate entire set of ``3DPRIMITIVE``
instructions using a shader. The generated instructions contain the
entire draw call parameters. This way the command streamer executes
only ``3DPRIMITIVE`` instructions and doesn't do any data loading from
memory or touch HW registers, feeding the 3D pipeline as fast as it
can.

In ANV this implemented in 2 different ways :

By generating instructions directly into the command stream using a
side batch buffer. When ANV encounters the first indirect draws, it
generates a jump into the side batch, the side batch contains a draw
call using a generation shader for each indirect draw. We keep adding
on more generation draws into the batch until we have to stop due to
command buffer end, secondary command buffer calls or a barrier
containing the access flag ``VK_ACCESS_INDIRECT_COMMAND_READ_BIT``.
The side batch buffer jump back right after the instruction where it
was called. Here is a high level diagram showing how the generation
batch buffer writes in the main command buffer :

.. graphviz::

  digraph commands_mode {
    rankdir = "LR"
    "main-command-buffer" [
      label = "main command buffer|...|draw indirect0 start|<f0>jump to\ngeneration batch|<f1>|<f2>empty instruction0|<f3>empty instruction1|...|draw indirect0 end|...|draw indirect1 start|<f4>empty instruction0|<f5>empty instruction1|...|<f6>draw indirect1 end|..."
      shape = "record"
    ];
    "generation-command-buffer" [
      label = "generation command buffer|<f0>|<f1>write draw indirect0|<f2>write draw indirect1|...|<f3>exit jump"
      shape = "record"
    ];
    "main-command-buffer":f0 -> "generation-command-buffer":f0;
    "generation-command-buffer":f1 -> "main-command-buffer":f2 [color="#0000ff"];
    "generation-command-buffer":f1 -> "main-command-buffer":f3 [color="#0000ff"];
    "generation-command-buffer":f2 -> "main-command-buffer":f4 [color="#0000ff"];
    "generation-command-buffer":f2 -> "main-command-buffer":f5 [color="#0000ff"];
    "generation-command-buffer":f3 -> "main-command-buffer":f1;
  }

By generating instructions into a ring buffer of commands, when the
draw count number is high. This solution allows smaller batches to be
emitted. Here is a high level diagram showing how things are
executed :

.. graphviz::

  digraph ring_mode {
    rankdir=LR;
    "main-command-buffer" [
      label = "main command buffer|...| draw indirect |<f1>generation shader|<f2> jump to ring|<f3> increment\ndraw_base|<f4>..."
      shape = "record"
    ];
    "ring-buffer" [
      label = "ring buffer|<f0>generated draw0|<f1>generated draw1|<f2>generated draw2|...|<f3>exit jump"
      shape = "record"
    ];
    "main-command-buffer":f2 -> "ring-buffer":f0;
    "ring-buffer":f3 -> "main-command-buffer":f3;
    "ring-buffer":f3 -> "main-command-buffer":f4;
    "main-command-buffer":f3 -> "main-command-buffer":f1;
    "main-command-buffer":f1 -> "ring-buffer":f1 [color="#0000ff"];
    "main-command-buffer":f1 -> "ring-buffer":f2 [color="#0000ff"];
  }

Runtime dependencies
--------------------

Starting with Intel 12th generation/Alder Lake-P and Intel Arc Alchemist, the Intel 3D driver stack requires GuC firmware for proper operation. You have two options to install the firmware:

- Distro package: Install the pre-packaged firmware included in your Linux distribution's repositories.
- Manual download: You can download the firmware from the official repository: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915. Place the downloaded files in the /lib/firmware/i915 directory.

Important: For optimal performance, we recommend updating the GuC firmware to version 70.6.3 or later.