File: SVM_ATOMIC.md

package info (click to toggle)
intel-graphics-compiler 1.0.12504.6-1%2Bdeb12u1
  • links: PTS, VCS
  • area: main
  • in suites:
  • size: 83,912 kB
  • sloc: cpp: 910,147; lisp: 202,655; ansic: 15,197; python: 4,025; yacc: 2,241; lex: 1,570; pascal: 244; sh: 104; makefile: 25
file content (131 lines) | stat: -rw-r--r-- 4,039 bytes parent folder | download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
<!---======================= begin_copyright_notice ============================

Copyright (C) 2020-2021 Intel Corporation

SPDX-License-Identifier: MIT

============================= end_copyright_notice ==========================-->

 

## Opcode

  SVM = 0x4e

  ATOMIC = 0x05

## Format

| | | | | | | |
| --- | --- | --- | --- | --- | --- | --- |
| 0x4e(SVM) | 0x05(ATOMIC) | Exec_size | Pred | Op | Addresses | Src0 | Src1 |
|           |              | Dst       |      |    |           |      |      |


## Semantics




                    for (i = 0; i < exec_size; ++i) {
                        if (ChEn[i]) {
                            // 2, 4, or 8 byte, atomic operation
                            dst[i] = *(addresses[i]);
                            *(addresses[i]) = op(*(addresses[i]), src0, src1);
                        }
                    }

## Description


    Performs 8 element scattered atomic "read-modify-write" operations to
    <addresses>. The values written depend on the atomic operation being
    performed, and the old values from the address are returned.

- **Exec_size(ub):** Execution size
 
  - Bit[2..0]: size of the region for source and destination operands
 
    - 0b000:  1 element (scalar) 
    - 0b001:  2 elements 
    - 0b010:  4 elements 
    - 0b011:  8 elements 
  - Bit[7..4]: execution mask (explicit control over the enabled channels)
 
    - 0b0000:  M1 
    - 0b0001:  M2 
    - 0b0010:  M3 
    - 0b0011:  M4 
    - 0b0100:  M5 
    - 0b0101:  M6 
    - 0b0110:  M7 
    - 0b0111:  M8 
    - 0b1000:  M1_NM 
    - 0b1001:  M2_NM 
    - 0b1010:  M3_NM 
    - 0b1011:  M4_NM 
    - 0b1100:  M5_NM 
    - 0b1101:  M6_NM 
    - 0b1110:  M7_NM 
    - 0b1111:  M8_NM
- **Pred(uw):** Predication control

- **Op(ub):** 
 
  - Bit[4..0]: encodes the atomic operation for this message
 
    - 0b00000:  add 
    - 0b00001:  sub 
    - 0b00010:  inc 
    - 0b00011:  dec 
    - 0b00100:  min 
    - 0b00101:  max 
    - 0b00110:  xchg 
    - 0b00111:  cmpxchg 
    - 0b01000:  and 
    - 0b01001:  or 
    - 0b01010:  xor 
    - 0b01011:  imin 
    - 0b01100:  imax 
    - 0b01101:  predec 
    - 0b10000:  fmax 
    - 0b10001:  fmin 
    - 0b10010:  fcmpwr 
  - {TGLLP+}Bit[6..5]: encodes the data width of the atomic operation
 
    - 0b00:  32-bit 
    - 0b01:  16-bit 
    - 0b10:  64-bit
- **Addresses(raw_operand):** The general variable storing the virtual addresses. The first 8 elements of the variable will be used, and they are in the unit of bytes. Each address must be Dword or Qword aligned depending on the type of Dst. Must have type UQ

- **Src0(raw_operand):** For the INC and DEC atomic operation it must be V0 (the null variable). For the other operations the first exec elements of the variable will be used as src0

- **Src1(raw_operand):** For the CMPXCHG and FCMPWR operation the first exec elements of the variable will be used as src1. For all other operations it must be V0

- **Dst(raw_operand):** The raw operand storing the results of the atomic operation. Dst is permitted to be V0, in which case no value will be returned

#### Properties


## Text
```
    

		 [(<P>)] SVM_ATOMIC.<op>[.16|.64] (<exec_size>) <addresses> <dst> <src0> <src1>  // if neither .16 nor .64 is specified, 32-bit atomics is implied
```



## Notes



    -   For 16bit atomics, the source operand shall be in an unpacked form, and only the lower 16bit part of each dword is used. The writeback (if any) has the same unpacked layout.
    -   For 64bit atomics, src and dst operands must have 64-bit types if they are not null.

    -   Dst, Src0, and Src1, if present, must have the same type, and the type requirements vary based on the operation:
            - IMIN, IMAX: type D, Q
            - FMAX, FMIN, FCMPWR: type F
            - Other operations: type UD. UQ

    - **16bit Atomics:** the operand types remain the same as 32bit atomics. And lower 16 bits of each dword will be reinterpreted as its 16 bit counterpart. Each access is a word and must be word-aligned.