File: SVM_GATHER.md

package info (click to toggle)
intel-graphics-compiler2 2.16.0-2
  • links: PTS, VCS
  • area: main
  • in suites: sid
  • size: 106,644 kB
  • sloc: cpp: 805,640; lisp: 287,672; ansic: 16,414; python: 3,952; yacc: 2,588; lex: 1,666; pascal: 313; sh: 186; makefile: 35
file content (172 lines) | stat: -rw-r--r-- 5,588 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
<!---======================= begin_copyright_notice ============================

Copyright (C) 2020-2022 Intel Corporation

SPDX-License-Identifier: MIT

============================= end_copyright_notice ==========================-->

## Opcode

  SVM = 0x4e
  GATHER = 0x03

## Format

| | | | | | | |
| --- | --- | --- | --- | --- | --- | --- |
| 0x4e(SVM) | 0x03(GATHER) | Exec_size | Pred | Block_size | Num_blocks | Addresses | Dst |


## Semantics


```

                    for (i = 0; i < exec_size; ++i) {
                        if (ChEn[i]) {
                            if (block_size == 4 || block_size == 8) {
                                for (j = 0; j < num_blocks; ++j) {
                                    dst[j*exec_size+i] = *(Addresses[i] + j*block_size); //4 or 8 byte
                                }
                            }
                            else { // block_size == 1
                                UD min_block_size = num_blocks < 4 ? 4 : num_blocks;
                                for (j = 0; j < num_blocks; ++j) {
                                    dst[i*min_block_size+j] = *(Addresses[i] + j); //1 byte
                                }
                            }
                        }
                    }
```

## Description





```
    Performs 8 element scattered read from virtual addresses and stores the
    result into <dst>. Each element may have multiple blocks.
```


- **Exec_size(ub):** Execution size

  - Bit[2..0]: size of the region for source and destination operands

    - 0b000:  1 element (scalar)
    - 0b001:  2 elements
    - 0b010:  4 elements
    - 0b011:  8 elements
    - 0b100:  16 elements
  - Bit[7..4]: execution mask (explicit control over the enabled channels)

    - 0b0000:  M1
    - 0b0001:  M2
    - 0b0010:  M3
    - 0b0011:  M4
    - 0b0100:  M5
    - 0b0101:  M6
    - 0b0110:  M7
    - 0b0111:  M8
    - 0b1000:  M1_NM
    - 0b1001:  M2_NM
    - 0b1010:  M3_NM
    - 0b1011:  M4_NM
    - 0b1100:  M5_NM
    - 0b1101:  M6_NM
    - 0b1110:  M7_NM
    - 0b1111:  M8_NM

- **Pred(uw):** Predication control


- **Block_size(ub):**

  - Bit[1..0]: encodes the byte size of each block

    - 0b00:  1 byte
    - 0b01:  4 byte
    - 0b11:  8 byte

- **Num_blocks(ub):**

  - Bit[1..0]: encodes the number of blocks to read for each address

    - 0b00:  1 block
    - 0b01:  2 blocks
    - 0b10:  4 blocks
    - 0b11:  8 blocks. It is only valid for 4 byte blocks and execution size 8

- **Addresses(raw_operand):** The general variable storing the addresses to read. The first 8 elements of the operand will be used as the virtual addresses to read from, and they are in the unit of bytes. Each address must be aligned to the <block_size>. Must have type UQ


- **Dst(raw_operand):** The general variable storing the results of the read. The format of the return value depends on <block_size> and <num_blocks>


#### Properties




## Text
```



    [(<P>)] SVM_GATHER.<block_size>.<num_blocks> (<exec_size>) <addresses> <dst>
```
## Notes





```
    4 byte gather:

        +-----------------+-----------------+----+----+-----+----+----+-----------------+
        | A0[0]           | A1[0]           |    |    | ... |    |    | A7[0]           |
        +-----------------+-----------------+----+----+-----+----+----+-----------------+
        |                                ...                                            |
        +-----------------+-----------------+----+----+-----+----+----+-----------------+
        | A0[#blocks-1]   | A1[#blocks-1]   |    |    | ... |    |    | A7[#blocks-1]   |
        +-----------------+-----------------+----+----+-----+----+----+-----------------+

    8 byte gather:

        +-----------------+-----------------+-----------------+-----------------+
        | A0[0]           | A1[0]           | A2[0]           | A3[0]           |
        +-----------------+-----------------+-----------------+-----------------+
        | A4[0]           | A5[0]           | A6[0]           | A7[0]           |
        +-----------------+-----------------+-----------------+-----------------+
        | ...                                                                   |
        +-----------------+-----------------+-----------------+-----------------+
        | A0[#blocks-1]   | A1[#blocks-1]   | A2[#blocks-1]   | A3[#blocks-1]   |
        +-----------------+-----------------+-----------------+-----------------+
        | A4[#blocks-1]   | A5[#blocks-1]   | A6[#blocks-1]   | A7[#blocks-1]   |
        +-----------------+-----------------+-----------------+-----------------+

    1 byte gather, 1/2/4 block (Ai[N] has undefined value if N>=num_blocks)

        +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+
        | A0[0:3]   | A1[0:3]   | A2[0:3]   | A3[0:3]   | A4[0:3]   | A5[0:3]   | A6[0:3]   | A7[0:3]   |
        +-----------+-----------+-----------+-----------+-----------+-----------+-----------+-----------+

    1 byte gather, 8 block

        +-----------+-----------+-----------+-----------+
        | A0[0:7]   | A1[0:7]   | A2[0:7]   | A3[0:7]   |
        +-----------+-----------+-----------+-----------+
        | A4[0:7]   | A5[0:7]   | A6[0:7]   | A7[0:7]   |
        +-----------+-----------+-----------+-----------+

    Each table row represents one GRF, and A0[0] corresponds to the
    first element in <dst> in all of the above tables. The operand's
    type must be the same size as <block_size>.

    Note that: When num_blocks > 1, exec_size must >= 8.
```